Building-ML-Pipelines / building-machine-learning-pipelines

Code repository for the O'Reilly publication "Building Machine Learning Pipelines" by Hannes Hapke & Catherine Nelson
MIT License
585 stars 250 forks source link

Interactive Pipeline. Trainer component. UnimplementedError: Cast string to float is not supported #9

Closed AlexanderKim closed 4 years ago

AlexanderKim commented 4 years ago

Hi,

Thanks for the book, great material. Can't get the whole pipeline working though. As I am running interactive_pipeline.ipynb, Trainer component cell I am getting the following error:

UnimplementedError:  Cast string to float is not supported
     [[node Cast (defined at /home/jovyan/work/building-machine-learning-pipelines/interactive-pipeline/../components/module.py:285) ]] [Op:__inference_train_function_13208]

Function call stack:
train_function

Please note that I couldn't make it through Transform component until I've changed type of field "zip_code" form INT (inferred by SchemaGen component) to BYTES. Don't know if that can contribute to the error mentioned above.

dylan-stark commented 4 years ago

@AlexanderKim, I ran into the same issue. See #11 for my approach to solving it.

hanneshapke commented 4 years ago

The issue is caused by the updated dataset. In addition to #11, I proposed a backward compatible implementation. Thank you @dylan-stark for the PR and @AlexanderKim for reporting it.

hanneshapke commented 4 years ago

Hi @dylan-stark and @AlexanderKim,

@drcat101 and I had a chance to review PR #11. The underlying problem is, as I mentioned in my PR review, that the underlying dataset changed. The labels were already converted and the download script converted the zip codes already from strings to int which caused the issue during the Transform step.

The easiest fix is to remove the entire update_csv function in the download script. With that tweak, the pipeline runs as expected.

We made the changes in PR #12