Building-ML-Pipelines / building-machine-learning-pipelines

Code repository for the O'Reilly publication "Building Machine Learning Pipelines" by Hannes Hapke & Catherine Nelson
MIT License
583 stars 250 forks source link

GCP AI Pipeline with Dataflow fails with TypeError #30

Closed mshearer0 closed 2 years ago

mshearer0 commented 3 years ago

Error:

File "apache_beam/coders/coder_impl.py", line 165, in apache_beam.coders.coder_impl.CoderImpl.estimate_size File "apache_beam/coders/coder_impl.py", line 488, in apache_beam.coders.coder_impl.BytesCoderImpl.encode_to_stream TypeError: Expected bytes, got list [while running 'InputToSerializedExample/InputSourceToExample/ParseCSVLine']

INFO:apache_beam.runners.dataflow.dataflow_runner:2020-09-27T07:25:49.571Z: JOB_MESSAGE_BASIC: Finished operation InputToSerializedExample/InputSourceToExample/ReadFromText/Read+InputToSerializedExample/InputSourceToExample/ParseCSVLine+InputToSerializedExample/InputSourceToExample/InferColumnTypes/KeyWithVoid+InputToSerializedExample/InputSourceToExample/InferColumnTypes/CombinePerKey/GroupByKey+InputToSerializedExample/InputSourceToExample/InferColumnTypes/CombinePerKey/Combine/Partial+InputToSerializedExample/InputSourceToExample/InferColumnTypes/CombinePerKey/GroupByKey/Reify+InputToSerializedExample/InputSourceToExample/InferColumnTypes/CombinePerKey/GroupByKey/Write INFO:apache_beam.runners.dataflow.dataflow_runner:2020-09-27T07:25:49.643Z: JOB_MESSAGE_DEBUG: Executing failure step failure72

Succeeds ok with dataflow removed:

# beam_pipeline_args=beam_pipeline_args,

hanneshapke commented 2 years ago

Hi @mshearer0,

Thank you for reporting this issue. Check out the latest updates to the example code: https://github.com/Building-ML-Pipelines/building-machine-learning-pipelines/releases/tag/examples_based_on_tfx_1.4