Building-ML-Pipelines / building-machine-learning-pipelines

Code repository for the O'Reilly publication "Building Machine Learning Pipelines" by Hannes Hapke & Catherine Nelson
MIT License
583 stars 250 forks source link

Interactive Pipeline crashes on different stages #57

Closed jawad5311 closed 2 years ago

jawad5311 commented 2 years ago

tensorflow==2.6.2 tfx==1.3.3

Environment: Google Colab


Bug 1

When importing external_input the following error occurs:

ModuleNotFoundError: No module named 'tfx.utils.dsl_utils'

Fix

ExampleGen component can now accept the path to data directory as a string

CsvExampleGen(input_base='path/to/csv_data/')

Remove the line from tfx.utils.dsl_utils import external_input from the cell


Bug 2

When creating CsvExampleGen component, the following error occurs:

TypeError: init() got an unexpected keyword argument 'input'

Fix

CsvExampleGen components parameter input is now changed to input_base

CsvExampleGen(input_base='path/to/csv_data/')


Bug 3

When running the transform component using the following line
context.run(transform)
It throws the following error:

OperatorNotAllowedInGraphError: using a tf.Tensor as a Python bool is not allowed: AutoGraph is disabled in this function. Try decorating it directly with @tf.function.

Fix

We need to decorate convert_zip_code() function with @tf.function in module.py

@tf.function
def convert_zip_code(zipcode: str) -> tf.float32:
    pass

Bug 4

After the upper fix the transform component throws an other error:

TypeError: bucketize() got an unexpected keyword argument 'always_return_num_quantiles'

Fix

always_return_num_quantiles arg of tft.bucketize is deprecated in version 0.26 of tensorflow-transform.
Remove or comment out the this argument from the function tft.bucketize() inside preprocessing_fn() in module.py.


Bug 5

After the upper fix, the transform component throws another error:

TypeError: '>' not supported between instances of 'NoneType' and 'int'

This error occurs as tensor returns the shape None and python cannot compare NoneType with int, float, or str.
I have tried to figure out where and why the tensor is returning shape None but it's over my head and can't figure out.

hanneshapke commented 2 years ago

Hi @jawad5311, Thank you for your submission! I am working on a code update for TFX 1.4 at the moment. FYI, @tf.function is not necessary. Bug5 happens due to the comparison in the zipcode function. I'll push an update in the next hours.

hanneshapke commented 2 years ago

Hi @jawad5311, Here is the updated code: https://github.com/Building-ML-Pipelines/building-machine-learning-pipelines/releases/tag/examples_based_on_tfx_1.4

Please reopen the issue if you run into issues. Thank you!