Building-ML-Pipelines / building-machine-learning-pipelines

Code repository for the O'Reilly publication "Building Machine Learning Pipelines" by Hannes Hapke & Catherine Nelson

MIT License

584 stars 249 forks source link

Do we need to create schema for train dataset or entire dataset to compare with eval and test dataset? #49

Closed IamExperimenting closed 2 years ago

IamExperimenting commented 3 years ago

Thank you for reporting an issue!

If you want to report an issue with the code in this repository, please provide the following information:

Your operating system name and version, as well as version numbers of the following packages: tensorflow, tfx.
Any details about your local setup that might be helpful in troubleshooting.
Detailed steps to reproduce the bug.

If you found an error in the book, please report it at https://www.oreilly.com/catalog/errata.csp?isbn=0636920260912.

hanneshapke commented 2 years ago

Hi @IamExperimenting,

Could you please provide more information about your use case? Are you referring to the chapter example or the pipeline examples? TFX recently added a new component to compare schemas (https://www.tensorflow.org/tfx/guide/schemagen#for_the_reviewed_schema_import). It can be useful to compare the schema (incl. drift/skew) between training, test, and eval datasets.

Please reopen this issue if it doesn't answer your initial question. Thank you!

Hannes