Building-ML-Pipelines / building-machine-learning-pipelines

Code repository for the O'Reilly publication "Building Machine Learning Pipelines" by Hannes Hapke & Catherine Nelson
MIT License
585 stars 250 forks source link

Data Ingestion: String to Float #15

Closed mshearer0 closed 4 years ago

mshearer0 commented 4 years ago

Downloaded dataset contains non-numeric zip codes ending "XX" causing conversion to fail, for example:

File "convert_data_to_tfrecords.py", line 47, in "zip_code": _int64_feature(int(float(row["zip_code"]))), ValueError: could not convert string to float: '113XX'

Replacing XX with 00 allows conversion to proceed.

hanneshapke commented 4 years ago

Hi @mshearer0,

I have updated the ingestion example with a conversion function. https://github.com/Building-ML-Pipelines/building-machine-learning-pipelines/commit/2f2c8e91bde2d22b0a0c6f6eb71e65a2a8732449

I hope the solutions works on your side.

Thank you for letting us know about the issue.

mshearer0 commented 4 years ago

Thanks - that was my solution too