Building-ML-Pipelines / building-machine-learning-pipelines

Code repository for the O'Reilly publication "Building Machine Learning Pipelines" by Hannes Hapke & Catherine Nelson
MIT License
585 stars 250 forks source link

BigQueryExampleGen failing due to lack of --project #14

Closed jimwill3 closed 4 years ago

jimwill3 commented 4 years ago

Thank you for reporting an issue!

If you want to report an issue with the code in this repository, please provide the following information:

There does not appear to be a way to inject project into the BigQueryExampleGen. the exact same query(used in the same notebook) in question works fine when passed in as part of: %%bigquery retail --project jwdeeplearn

Not sure if this is an error in the book of just an issue with BigQueryExampleGen (or perhaps BQEG is not passing project info along to apache beam?)

If you found an error in the book, please report it at https://www.oreilly.com/catalog/errata.csp?isbn=0636920260912.

hanneshapke commented 4 years ago

Hi @jimwill3,

The project is configured through the Apache Beam args. I agree with you we should have been a bit more explicit. Here is an example configuration

from tfx.components.example_gen.big_query_example_gen.component import BigQueryExampleGen  

query = """SELECT product FROM `YOUR_PROJECT_NAME.consumer_complaints.complaints`"""
example_gen = BigQueryExampleGen(query=query)

context.run(example_gen, beam_pipeline_args=['--project', 'YOUR_PROJECT_NAME'])

Alternatively, you can provide the beam_pipeline_args via the DagRunners for Kubeflow, Apache Beam, or Apache Airflow.

If this solves the issue, please let us know and close the issue. We'll then move the issue to the book Errata. Thank you!

jimwill3 commented 4 years ago

Thanks and yes - adding the beam_pipeline_args worked successfully.