GoogleCloudPlatform / data-science-on-gcp

Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017
Apache License 2.0
1.31k stars 712 forks source link

README improvement for chapters 2 and 3 regarding upload to BQ #159

Open jgammerman opened 1 year ago

jgammerman commented 1 year ago

Hello,

Excellent book so far, but a problem I've been having is uploading the 2015 CSVs from my cloud storage bucket to BigQuery.

Both the ch2 and ch3 READMEs just tell you to run:

cd data-science-on-gcp/02_ingest ./ingest_from_crsbucket.sh bucketname

But this only copies the CSVs from the book's bucket to the user's. It doesn't cover the next stage i.e. uploading to BQ.

The alternative route of ingesting from the original source of data also doesn't work: I found that my Google Cloud Shell kept disconnecting halfway through the upload process.

Therefore I'd recommend adding the following instruction to both READMEs, showing you explicitly how to do the upload to BQ:

bash bqload.sh bucketname 2015

lakshmanok commented 1 year ago

thanks, I've put in a pull request to make the change. Instead of using ./ingest_from_crsbucket.sh, simply using ./ingest.sh will do the trick as it also uploads to BigQuery.

jgammerman commented 1 year ago

That approach didn't work for me either - my Cloud Shell would disconnect halfway through the upload to BQ so I would end up with an incomplete table. Solution was simply to run bash bqload.sh bucketname 2015.

Other people may not be so unfortunate though!

softjobs commented 1 year ago

Struggling for almost a day now trying to load to BigQuery without luck... used the bqload.sh with the correct params but getting the "Not found: URI gs://srini-laks-gcp1-dsongcp" error.

Enjoyed reading the two chapters but surprised to see the "user-unfriendliness" of this GCP platform. It shouldn't;t have to take all this time, given the data available through a Google search, but it does! Frustrating, to say the least.

softjobs commented 1 year ago

Struggling for almost a day now trying to load to BigQuery without luck... used the bqload.sh with the correct params but getting the "Not found: URI gs://srini-laks-gcp1-dsongcp" error.

Enjoyed reading the two chapters but surprised to see the "user-unfriendliness" of this GCP platform. It shouldn't;t have to take all this time, given the data available through a Google search, but it does! Frustrating, to say the least.

Got it to work finally... Page 49 changes: