Building-ML-Pipelines / building-machine-learning-pipelines

Code repository for the O'Reilly publication "Building Machine Learning Pipelines" by Hannes Hapke & Catherine Nelson
MIT License
584 stars 249 forks source link

Do we need to create schema for train dataset or entire dataset to compare with eval and test dataset? #49

Closed IamExperimenting closed 2 years ago

IamExperimenting commented 3 years ago

Thank you for reporting an issue!

If you want to report an issue with the code in this repository, please provide the following information:

If you found an error in the book, please report it at https://www.oreilly.com/catalog/errata.csp?isbn=0636920260912.

hanneshapke commented 2 years ago

Hi @IamExperimenting,

Could you please provide more information about your use case? Are you referring to the chapter example or the pipeline examples? TFX recently added a new component to compare schemas (https://www.tensorflow.org/tfx/guide/schemagen#for_the_reviewed_schema_import). It can be useful to compare the schema (incl. drift/skew) between training, test, and eval datasets.

Please reopen this issue if it doesn't answer your initial question. Thank you!