Building-ML-Pipelines / building-machine-learning-pipelines

Code repository for the O'Reilly publication "Building Machine Learning Pipelines" by Hannes Hapke & Catherine Nelson
MIT License
585 stars 250 forks source link

Data Validation: 26k file #16

Closed mshearer0 closed 4 years ago

mshearer0 commented 4 years ago

How do i create the file '26k-consumer-complaints-modified.csv' for data validation script?

hanneshapke commented 4 years ago

Hi @mshearer0,

I have extended the example notebook with the file split. The updated version can be found here: https://github.com/Building-ML-Pipelines/building-machine-learning-pipelines/blob/master/chapters/data_validation/data_validation.ipynb

Let me know if you have further questions, Hannes

mshearer0 commented 4 years ago

Thanks - did you just change the source file?

The file is html not a json-formatted notebook

hanneshapke commented 4 years ago

Oh wow. I edited the file with VS Code. Didn't see different behavior on my end. Let me see if I can re-upload the file.

hanneshapke commented 4 years ago

@mshearer0 Any chance Github has problems rendering the notebooks at the moment? All notebooks in the repo are presented as HTML -> https://github.com/Building-ML-Pipelines/building-machine-learning-pipelines/blob/master/chapters/data_privacy/differential_privacy.ipynb (which I haven't touched today)

mshearer0 commented 4 years ago

Yes, that file is also displaying as HTML