GoogleCloudPlatform / data-science-on-gcp

Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017
Apache License 2.0
1.31k stars 715 forks source link

chapter 04: df07.py - Unable to open file: gs://BUCKETNAME/flights/staging/ch04timecorr.1656567385.996847/pipeline.pb. #151

Open Acturio opened 2 years ago

Acturio commented 2 years ago

Hi! i have the next log when i try to run df07.py.

./df07.py --project PROJECT --bucket BUCKETNAME --region us-central1 Correcting timestamps and writing to BigQuery dataset /home/act_arturo_b/.local/lib/python3.9/site-packages/apache_beam/io/gcp/bigquery.py:2527: BeamDeprecationWarning: options is deprecated since First stable release. References to .options will not be supported temp_location = pcoll.pipeline.options.view_as( /home/act_arturo_b/.local/lib/python3.9/site-packages/apache_beam/io/gcp/bigquery_file_loads.py:1129: BeamDeprecationWarning: options is deprecated since First stable release. References to .options willnot be supported temp_location = p.options.view_as(GoogleCloudOptions).temp_location warning: sdist: standard file not found: should have one of README, README.rst, README.txt, README.md

ERROR:apache_beam.runners.dataflow.dataflow_runner:Console URL: https://console.cloud.google.com/dataflow/jobs//2022-06-29_22_36_30-1790320629162913076?project= Traceback (most recent call last): File "/home/act_arturo_b/data-science-on-gcp/04_streaming/transform/./df07.py", line 202, in run(project=args['project'], bucket=args['bucket'], region=args['region']) File "/home/act_arturo_b/data-science-on-gcp/04_streaming/transform/./df07.py", line 177, in run (events File "/home/act_arturo_b/.local/lib/python3.9/site-packages/apache_beam/pipeline.py", line 598, in exit self.result.wait_until_finish() File "/home/act_arturo_b/.local/lib/python3.9/site-packages/apache_beam/runners/dataflow/dataflow_runner.py", line 1673, in wait_until_finish raise DataflowRuntimeException( apache_beam.runners.dataflow.dataflow_runner.DataflowRuntimeException: Dataflow pipeline failed. State: FAILED, Error: Unable to open file: gs://BUCKETNAME/flights/staging/ch04timecorr.1656567385.996847/pipeline.pb.

Any suggest will be appreciated. Thank you

lakshmanok commented 2 years ago

Looking at the last line, it looks like you forgot to specify the bucket on the input to df07.py

AILED, Error: Unable to open file: gs://BUCKETNAME/flights/staging/ch04timecorr. 1656567385.996847/pipeline.pb

thanks, Lak

On Wed, Jun 29, 2022, 11:01 PM Arturo Bringas @.***> wrote:

Hi! i have the next log when i try to run df07.py.

./df07.py --project ${PROJECT} --bucket ${BUCKETNAME} --region us-central1 Correcting timestamps and writing to BigQuery dataset /home/act_arturo_b/.local/lib/python3.9/site-packages/apache_beam/io/gcp/bigquery.py:2527: BeamDeprecationWarning: options is deprecated since First stable release. References to .options will not be supported temp_location = pcoll.pipeline.options.view_as( /home/act_arturo_b/.local/lib/python3.9/site-packages/apache_beam/io/gcp/bigquery_file_loads.py:1129: BeamDeprecationWarning: options is deprecated since First stable release. References to .options willnot be supported temp_location = p.options.view_as(GoogleCloudOptions).temp_location warning: sdist: standard file not found: should have one of README, README.rst, README.txt, README.md

ERROR:apache_beam.runners.dataflow.dataflow_runner:Console URL: https://console.cloud.google.com/dataflow/jobs/ /2022-06-29_22_36_30-1790320629162913076?project= Traceback (most recent call last): File "/home/act_arturo_b/data-science-on-gcp/04_streaming/transform/./df07.py", line 202, in run(project=args['project'], bucket=args['bucket'], region=args['region']) File "/home/act_arturo_b/data-science-on-gcp/04_streaming/transform/./df07.py", line 177, in run (events File "/home/act_arturo_b/.local/lib/python3.9/site-packages/apache_beam/pipeline.py", line 598, in exit self.result.wait_until_finish() File "/home/act_arturo_b/.local/lib/python3.9/site-packages/apache_beam/runners/dataflow/dataflow_runner.py", line 1673, in wait_until_finish raise DataflowRuntimeException( apache_beam.runners.dataflow.dataflow_runner.DataflowRuntimeException: Dataflow pipeline failed. State: FAILED, Error: Unable to open file: gs://BUCKETNAME/flights/staging/ch04timecorr.1656567385.996847/pipeline.pb.

Any suggest will be appreciated. Thank you

— Reply to this email directly, view it on GitHub https://github.com/GoogleCloudPlatform/data-science-on-gcp/issues/151, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANJPZ4JFDDNAFPLKFEHNJDVRUZ5DANCNFSM52IC5SFA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

Acturio commented 2 years ago

i appreciate the quick response. This is the last log:

act_arturo_b@cloudshell:~/data-science-on-gcp/04_streaming/transform (ds-on-gcp-353305)$ ./df07.py --project ds-on-gcp-353305 --bucket ${BUCKETNAME} --region us-central1 Correcting timestamps and writing to BigQuery dataset /home/act_arturo_b/.local/lib/python3.9/site-packages/apache_beam/io/gcp/bigquery.py:2527: BeamDeprecationWarning: options is deprecated since First stable release. References to .options will not be supported temp_location = pcoll.pipeline.options.view_as( /home/act_arturo_b/.local/lib/python3.9/site-packages/apache_beam/io/gcp/bigquery_file_loads.py:1129: BeamDeprecationWarning: options is deprecated since First stable release. References to .options will not be supported temp_location = p.options.view_as(GoogleCloudOptions).temp_location warning: sdist: standard file not found: should have one of README, README.rst, README.txt, README.md

ERROR:apache_beam.runners.dataflow.dataflow_runner:Console URL: https://console.cloud.google.com/dataflow/jobs//2022-06-29_23_27_32-11374214288357084698?project= Traceback (most recent call last): File "/home/act_arturo_b/data-science-on-gcp/04_streaming/transform/./df07.py", line 202, in run(project=args['project'], bucket=args['bucket'], region=args['region']) File "/home/act_arturo_b/data-science-on-gcp/04_streaming/transform/./df07.py", line 177, in run (events File "/home/act_arturo_b/.local/lib/python3.9/site-packages/apache_beam/pipeline.py", line 598, in exit self.result.wait_until_finish() File "/home/act_arturo_b/.local/lib/python3.9/site-packages/apache_beam/runners/dataflow/dataflow_runner.py", line 1673, in wait_until_finish raise DataflowRuntimeException( apache_beam.runners.dataflow.dataflow_runner.DataflowRuntimeException: Dataflow pipeline failed. State: FAILED, Error: Unable to open file: gs://ds-on-gcp-353305-dsongcp/flights/staging/ch04timecorr.1656570447.957722/pipeline.pb.

the problem is the same.

any help will be appreciate.

lakshmanok commented 2 years ago

does this bucket exist? Is the bucket in the us-central1 region?

ds-on-gcp-353305-dsongcp

In any case, the pipeline is failing because it is not able to create this file.

Lak

On Wed, Jun 29, 2022 at 11:33 PM Arturo Bringas @.***> wrote:

i appreciate the quick response. This is the last log:

@.***:~/data-science-on-gcp/04_streaming/transform (ds-on-gcp-353305)$ ./df07.py --project ds-on-gcp-353305 --bucket ${BUCKETNAME} --region us-central1 Correcting timestamps and writing to BigQuery dataset /home/act_arturo_b/.local/lib/python3.9/site-packages/apache_beam/io/gcp/bigquery.py:2527: BeamDeprecationWarning: options is deprecated since First stable release. References to .options will not be supported temp_location = pcoll.pipeline.options.view_as( /home/act_arturo_b/.local/lib/python3.9/site-packages/apache_beam/io/gcp/bigquery_file_loads.py:1129: BeamDeprecationWarning: options is deprecated since First stable release. References to .options will not be supported temp_location = p.options.view_as(GoogleCloudOptions).temp_location warning: sdist: standard file not found: should have one of README, README.rst, README.txt, README.md

ERROR:apache_beam.runners.dataflow.dataflow_runner:Console URL: https://console.cloud.google.com/dataflow/jobs/ /2022-06-29_23_27_32-11374214288357084698?project= Traceback (most recent call last): File "/home/act_arturo_b/data-science-on-gcp/04_streaming/transform/./df07.py", line 202, in run(project=args['project'], bucket=args['bucket'], region=args['region']) File "/home/act_arturo_b/data-science-on-gcp/04_streaming/transform/./df07.py", line 177, in run (events File "/home/act_arturo_b/.local/lib/python3.9/site-packages/apache_beam/pipeline.py", line 598, in exit self.result.wait_until_finish() File "/home/act_arturo_b/.local/lib/python3.9/site-packages/apache_beam/runners/dataflow/dataflow_runner.py", line 1673, in wait_until_finish raise DataflowRuntimeException( apache_beam.runners.dataflow.dataflow_runner.DataflowRuntimeException: Dataflow pipeline failed. State: FAILED, Error: Unable to open file: gs://ds-on-gcp-353305-dsongcp/flights/staging/ch04timecorr.1656570447.957722/pipeline.pb.

the problem is the same.

any help will be appreciate.

— Reply to this email directly, view it on GitHub https://github.com/GoogleCloudPlatform/data-science-on-gcp/issues/151#issuecomment-1170822589, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANJPZ2H7VBICRQK6B32FTLVRU5TDANCNFSM52IC5SFA . You are receiving this because you commented.Message ID: @.***>