GoogleCloudPlatform / data-science-on-gcp

Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017
Apache License 2.0
1.32k stars 712 forks source link

problem extracting bigquery table data to cloud storage #153

Open shinchri opened 2 years ago

shinchri commented 2 years ago

I am in chapter 9 of the book trying to extract BigQuery table into Google Cloud Storage when I ran into a problem.

The code where I ran into problem is (inside the correct project id is printed by PROJECT):

%%bash
PROJECT=$(gcloud config get-value project)
for dataset in "train" "eval" "all"; do
  TABLE=dsongcp.flights_${dataset}_data
  CSV=gs://${BUCKET}/ch9/data/${dataset}.csv
  echo "Exporting ${TABLE} to ${CSV} and deleting table"
  bq --project_id=${PROJECT} extract --destination_format=CSV $TABLE $CSV
  bq --project_id=${PROJECT} rm -f $TABLE
done

For some odd reason, I am getting weird error message:

Exporting dsongcp.flights_train_data to gs://tribbute-ml-central/ch9/data/train.csv and deleting table in project tribbute-ml
BigQuery error in extract operation: BigQuery API has not been used in project
457198359311 before or it is disabled. Enable it by visiting https://console.dev/
elopers.google.com/apis/api/bigquery.googleapis.com/overview?project=457198359311 
then retry. If you enabled this API recently, wait a few minutes for the
action to propagate to our systems and retry.

This is followed by:

CalledProcessError: Command 'b'PROJECT="tribbute-ml"\nfor dataset in "train" "eval" "all"; do\n  TABLE=dsongcp.flights_${dataset}_data\n  CSV=gs://${BUCKET}/ch9/data/${dataset}.csv\n  echo "Exporting ${TABLE} to ${CSV} and deleting table in project ${PROJECT}"\n  bq extract --project_id=${PROJECT} --location=us-central1 --destination_format=CSV $TABLE $CSV\n  bq --project_id=${PROJECT} rm -f $TABLE\ndone\n'' returned non-zero exit status 1.

Weird thing is that I already enabled BigQuery API for my project and "457198359311 " is not even my project. (I verified that in the bash command correct project id gets printed)

Anyone knows what's causing this issue and how to fix it?

lakshmanok commented 2 years ago

This looks like a bug. It's possible that bq extract is using a "shadow" project to run a pipeline to do the extraction.

Can you file an issue against BigQuery? https://issuetracker.google.com/savedsearches/559654?pli=1&q=(componentid:187149%2B%20status:open%20%2B%20type:Bug)%20OR%20(componentid:187065%2B%20customfield82940:%22BigQuery%22%20status:open)

Looks like people are running into issues with saving query results as a table as well, but they don't get the nice error message that you do.

thanks Lak

On Wed, Aug 3, 2022 at 2:11 PM Chris Shin @.***> wrote:

I am in chapter 9 of the book trying to extract BigQuery table into Google Cloud Storage when I ran into a problem.

The code where I ran into problem is (inside the correct project id is printed by PROJECT):

%%bash PROJECT=$(gcloud config get-value project) for dataset in "train" "eval" "all"; do TABLE=dsongcp.flights_${dataset}_data CSV=gs://${BUCKET}/ch9/data/${dataset}.csv echo "Exporting ${TABLE} to ${CSV} and deleting table" bq --project_id=${PROJECT} extract --destination_format=CSV $TABLE $CSV bq --project_id=${PROJECT} rm -f $TABLE done

For some odd reason, I am getting weird error message:

Exporting dsongcp.flights_train_data to gs://tribbute-ml-central/ch9/data/train.csv and deleting table in project tribbute-ml BigQuery error in extract operation: BigQuery API has not been used in project 457198359311 before or it is disabled. Enable it by visiting https://console.dev/elopers.google.com/apis/api/bigquery.googleapis.com/overview?project=457198359311 then retry. If you enabled this API recently, wait a few minutes for the action to propagate to our systems and retry.

This is followed by:

CalledProcessError: Command 'b'PROJECT="tribbute-ml"\nfor dataset in "train" "eval" "all"; do\n TABLE=dsongcp.flights_${dataset}_data\n CSV=gs://${BUCKET}/ch9/data/${dataset}.csv\n echo "Exporting ${TABLE} to ${CSV} and deleting table in project ${PROJECT}"\n bq extract --project_id=${PROJECT} --location=us-central1 --destination_format=CSV $TABLE $CSV\n bq --project_id=${PROJECT} rm -f $TABLE\ndone\n'' returned non-zero exit status 1.

Weird thing is that I already enabled BigQuery API for my project and "457198359311 " is not even my project. (I verified that in the bash command correct project id gets printed)

— Reply to this email directly, view it on GitHub https://github.com/GoogleCloudPlatform/data-science-on-gcp/issues/153, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANJPZ6DRWYL2OOL7HKWWP3VXLN6ZANCNFSM55QILGTA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

shinchri commented 2 years ago

I filed an issue.

For now if anyone else runs into same issue, you can manually export the table. Go the BigQuery, then click the table you want to export. Under export tab click "Export to GCS"