Building-ML-Pipelines / building-machine-learning-pipelines

Code repository for the O'Reilly publication "Building Machine Learning Pipelines" by Hannes Hapke & Catherine Nelson
MIT License
585 stars 250 forks source link

Fix for interactive-pipeline Transformer and Trainer components. #11

Closed dylan-stark closed 4 years ago

dylan-stark commented 4 years ago

These commits address two issues brought up in #9 related to the interactive pipeline notebook. First, the Transformer component errors out because the zip_code field in the current upstream dataset is inferred as INT and module.preprocessing_fn attempts to process it as a string. Second, the Trainer component errors out because the upstream dataset has the consumer_disputed field already converted to {0,1} but download_dataset.update_csv attempts to do that again, resulting in nans that the trainer chokes on.

I took a best guess at patches for both. I'm open for suggestions.

hanneshapke commented 4 years ago

I have closed the PR, since #12 removes the entire update function. Thank you @dylan-stark for the PR!