GoogleCloudPlatform / data-science-on-gcp

Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017
Apache License 2.0
1.31k stars 712 forks source link

Ch.9 - can't export data to GCS - wrong project number? #164

Open jgammerman opened 1 year ago

jgammerman commented 1 year ago

Hi @lakshmanok - loving the book! I'm on chapter 9 now and I've encountered an error I can't debug.

In the section Preparing BigQuery data for Tensorflow, (p.315), there's a part where you extract some BigQuery tables to GC storage. The relevant bit of code in your notebook is as follows:

PROJECT=$(gcloud config get-value project)
for dataset in "train" "eval" "all"; do
  TABLE=dsongcp.flights_${dataset}_data
  CSV=gs://${BUCKET}/ch9/data/${dataset}.csv
  echo "Exporting ${TABLE} to ${CSV} and deleting table"
  bq --project_id=${PROJECT} extract --destination_format=CSV $TABLE $CSV
  bq --project_id=${PROJECT} rm -f $TABLE
done

Which gives me the following error:

Exporting dsongcp.flights_train_data to gs://peppy-booth-371612-dsongcp/ch9/data/train.csv and deleting table
BigQuery error in extract operation: BigQuery API has not been used in project
457198359346 before or it is disabled. Enable it by visiting https://console.dev/
elopers.google.com/apis/api/bigquery.googleapis.com/overview?project=45719835934
6 then retry. If you enabled this API recently, wait a few minutes for the
action to propagate to our systems and retry.

I know that the BigQuery API has already been enabled for my project, so I think that the problem is that it's picking up the wrong project number: in the output above, the end of the URL refers to project=45719835934, but that's not my project number! It's 506913857436, as shown here:

image

And indeed the correct project number is returned if I ask for it explicitly in my notebook:

image

Can you think of any reason why it would be picking up the wrong project number when trying to export from BQ to GCS?

lakshmanok commented 1 year ago

Not sure why gcloud config is returning the wrong project but you can tell it the right one with gcloud auth. Alternately just replace the line that calls gcloud with the right project number

thanks, Lak

On Mon, Feb 6, 2023, 6:42 AM James @.***> wrote:

Hi @lakshmanok https://github.com/lakshmanok - loving the book! I'm on chapter 9 now and I've encountered an error I can't debug.

In the section Preparing BigQuery data for Tensorflow, (p.315), there's a part where you extract some BigQuery tables to GC storage. The relevant bit of code in your notebook is as follows:

PROJECT=$(gcloud config get-value project) for dataset in "train" "eval" "all"; do TABLE=dsongcp.flights_${dataset}_data CSV=gs://${BUCKET}/ch9/data/${dataset}.csv echo "Exporting ${TABLE} to ${CSV} and deleting table" bq --project_id=${PROJECT} extract --destination_format=CSV $TABLE $CSV bq --project_id=${PROJECT} rm -f $TABLE done

Which gives me the following error:

Exporting dsongcp.flights_train_data to gs://peppy-booth-371612-dsongcp/ch9/data/train.csv and deleting table BigQuery error in extract operation: BigQuery API has not been used in project 457198359346 before or it is disabled. Enable it by visiting https://console.dev/elopers.google.com/apis/api/bigquery.googleapis.com/overview?project=45719835934 6 then retry. If you enabled this API recently, wait a few minutes for the action to propagate to our systems and retry.

I know that the BigQuery API has already been enabled for my project, so I think that the problem is that it's picking up the wrong project number: in the output above, the end of the URL refers to project=45719835934, but that's not my project number! It's 506913857436, as shown here:

[image: image] https://user-images.githubusercontent.com/8484188/217001000-ec2207b2-4123-4ac3-bec7-06f273b7bbae.png

And indeed the correct project number is returned if I ask for it explicitly in my notebook:

[image: image] https://user-images.githubusercontent.com/8484188/216999514-e46090fe-218a-490f-987b-ac8b0499e768.png

Can you think of any reason why it would be picking up the wrong project number when trying to export from BQ to GCS?

— Reply to this email directly, view it on GitHub https://github.com/GoogleCloudPlatform/data-science-on-gcp/issues/164, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANJPZYVH6EZDGBCVG2W57LWWEEUVANCNFSM6AAAAAAUSYOFDM . You are receiving this because you were mentioned.Message ID: @.***>

jgammerman commented 1 year ago

Thanks for the prompt response Lak. Unfortunately I don't think it's that simple, unless I've just misunderstood you...

See the screenshot below. My project ID appears to be correct, as does the project number (first 2 outputs), but then when I run the code the error suggests that its is looking at different project number, even though the project ID is definitely correct:

image

Simply setting PROJECT=506913857436 made no difference I'm afraid.

lakshmanok commented 1 year ago

Okay, this seems to be a bad error message. Essentially, "bq" is using an internal project to do the extract, and that call is failing for some reason. Could you check whether the BUCKET is in the same region as the BigQuery dataset?

Also, please file a bug in BigQuery ...

thanks Lak

On Mon, Feb 6, 2023 at 12:09 PM James @.***> wrote:

Thanks for the prompt response Lak. Unfortunately I don't think it's that simple, unless I've just misunderstood you...

See the screenshot below. My project ID appears to be correct, as does the project number (first 2 outputs), but then when I run the code the error suggests that its is looking at different project number, even though the project ID is definitely correct:

[image: image] https://user-images.githubusercontent.com/8484188/217073849-a7a50c4f-75f1-45f4-9d7f-084ceb7a363f.png

Simply setting PROJECT=506913857436 made no difference I'm afraid.

— Reply to this email directly, view it on GitHub https://github.com/GoogleCloudPlatform/data-science-on-gcp/issues/164#issuecomment-1419680772, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANJPZ7P2GM3MHXEDB3SNFLWWFK7BANCNFSM6AAAAAAUSYOFDM . You are receiving this because you were mentioned.Message ID: @.***>

jgammerman commented 1 year ago

@lakshmanok see my in-line responses:

Could you check whether the BUCKET is in the same region as the BigQuery dataset?

So originally my BQ datasets were located in the US and my bucket was in the EU (eu-west1). I've tried creating a new bucket in us-central1 and re-running the extraction, but unfortunately that's producing exactly the same error (with the same incorrect project number).

Also, please file a bug in BigQuery

Done - see here