GoogleCloudPlatform / ml-design-patterns

Source code accompanying O'Reilly book: Machine Learning Design Patterns
Apache License 2.0
1.87k stars 527 forks source link

Dataflow Batch job creates a Zero Byte TensorFlow record file #19

Open jeffreycunn opened 3 years ago

jeffreycunn commented 3 years ago

I am working through the following notebook: https://github.com/GoogleCloudPlatform/ml-design-patterns/blob/master/02_data_representation/weather_search/wx_embeddings.ipynb. I am running a GCP AI Notebook VM with JupyterLab.

When I get to the following line of code: %run -m wxsearch.hrrr_to_tfrecord -- --startdate 20190915 --enddate 20190916 --outdir gs://{BUCKET}/wxsearch/data/2019 --project {PROJECT}, my Dataflow batch job indicates that it runs fine and to completion (first image below). However, the batch job produces a zero byte TensorFlow record file (second image below). The zero elements per second seems concerning to me in create_tfr, although I don't know if this is a problem.

Any thoughts as to what may be happening? The only modifications I made were to the bucket and project variables where I wrote my own bucket and project values into the command.

Screen Shot 2020-11-17 at 10 37 30 PM Screen Shot 2020-11-17 at 10 31 23 PM