[ ] In Ch. 3 README (and propagate forward), correct the ./ingest_from_crsbucket instructions:
include .sh extension (copy how you do it in Ch 2 README?)
make clear that "bucketname" should be replaced (copy how you do it in Ch 2 README?)
[ ] In Ch. 3 README (and propagate forward) indicate that one has to run ./bqload.sh csv-bucket-name YEAR to populate BigQuery before executing the step that runs ./create_views.sh
[ ] In Ch. 3 README consider adding horizontal lines or other break to distinguish the optional part beginning and end
[ ] In Ch. 4 README, Warn users about error possibility and venv workaround on the first step of Batch processing transformation in DataFlow. Explain that if they wait too long to execute everything and lose their venv, they'll have to rerun the step.
[ ] In Ch. 4 README, after the "catch up" section, guide user to navigating into ~/data-science-on-gcp/04_streaming
[ ] In Ch. 4 README, Indicate that you should replace text (bucket, project) in Read/write to Cloud step
Joy Payton here. A few suggestions / notes on the edition2 branch:
[ ] In Ch. 2 README (and propagate forward), add the link to storage: https://console.cloud.google.com/storage/browser
[ ] In Ch. 3 README (and propagate forward), correct the ./ingest_from_crsbucket instructions:
include .sh extension (copy how you do it in Ch 2 README?)
make clear that "bucketname" should be replaced (copy how you do it in Ch 2 README?)
[ ] In Ch. 3 README (and propagate forward) indicate that one has to run ./bqload.sh csv-bucket-name YEAR to populate BigQuery before executing the step that runs ./create_views.sh
[ ] In Ch. 3 README consider adding horizontal lines or other break to distinguish the optional part beginning and end
[ ] In Ch. 4 README, Warn users about error possibility and
venv
workaround on the first step of Batch processing transformation in DataFlow. Explain that if they wait too long to execute everything and lose theirvenv
, they'll have to rerun the step.[ ] In Ch. 4 README, after the "catch up" section, guide user to navigating into ~/data-science-on-gcp/04_streaming
[ ] In Ch. 4 README, Indicate that you should replace text (bucket, project) in Read/write to Cloud step
[ ] In Ch. 4 README, Enable api at https://console.developers.google.com/apis/api/dataflow.googleapis.com/overview before running df07.py
[ ] In Ch. 4 README, Add "in the cloud dataflow section" to after df07 and/or link https://console.cloud.google.com/dataflow/jobs
[ ] In Ch. 4 README add a link to BigQuery before the query itself (https://console.cloud.google.com/bigquery)
[ ] In Ch. 4 README, in the step Simulate Event Stream, learner has just been in
transform
so change the cd command tocd ../simulate
or a path from ~[ ] In Ch. 4 README, in the Real-time Stream Processing step:
explain how to open a new cloudshell
again, provide the path from home directory for the cd command
make project and bucketname substitution more obvious
describe expected output and that you'll have a foreground process
specify using ctrl-c in the correct terminal window
make explicit that "you need to run this in dataflow" will be handled by the next script
add step tp enable pubsub api at https://console.developers.google.com/apis/api/pubsub.googleapis.com/
suggest view name for ATL delays