Building-ML-Pipelines / building-machine-learning-pipelines

Code repository for the O'Reilly publication "Building Machine Learning Pipelines" by Hannes Hapke & Catherine Nelson
MIT License
583 stars 250 forks source link

context.run(statistics_gen) errors in interactive pipeline sample #35

Closed jimwill3 closed 2 years ago

jimwill3 commented 3 years ago

In a freshly created GCP AI Notebook (linux) tfx==0.22 tf==2.2.1 when i get to the cell to run execute StatisticsGen (ExampleGen ran fine) I get the following error which seems to be coming from tensorflow validation or perhaps apache beam. my tfdv version is: 0.22.2

TypeCheckError: Type hint violation for 'ToTopKTuples': requires FrozenSet[FeaturePath] but got Set[Any] for bytes_features Full type hint: IOTypeHints[inputs=((Tuple[Union[NoneType, bytes, str], RecordBatch], FrozenSet[FeaturePath], FrozenSet[FeaturePath], Union[NoneType, str]), {}), outputs=((Tuple[Tuple[Union[NoneType, bytes, str], Tuple[Union[bytes, str], ...], Any], Union[Tuple[int, Union[float, int]], int]],), {})] strip_iterable()

based on: IOTypeHints[inputs=((Tuple[Union[NoneType, bytes, str], RecordBatch], FrozenSet[FeaturePath], FrozenSet[FeaturePath], Union[NoneType, str]), {}), outputs=((Iterable[Tuple[Tuple[Union[NoneType, bytes, str], Tuple[Union[bytes, str], ...], Any], Union[Tuple[int, Union[float, int]], int]]],), {})] from_callable(_to_topk_tuples) signature: (sliced_record_batch: Tuple[Union[str, bytes, NoneType], pyarrow.lib.RecordBatch], bytes_features: FrozenSet[tensorflow_data_validation.types.FeaturePath], categorical_features: FrozenSet[tensorflow_data_validation.types.FeaturePath], weight_feature: Union[str, NoneType]) -> Iterable[Tuple[Tuple[Union[str, bytes, NoneType], Tuple[Union[bytes, str], ...], Any], Union[int, Tuple[int, Union[int, float]]]]] File "/opt/conda/lib/python3.7/site-packages/tensorflow_data_validation/statistics/generators/top_k_uniques_stats_generator.py", line 202

i cloned the repo as of today so I don't think there's anything stale about the environment. thx, j.

shazamkash commented 3 years ago

I was able to resolve the above error by using the latest versions of tensorflow and tfx. tensorflow==2.4.1 tfx==0.27.0

hanneshapke commented 2 years ago

Hi @shazamkash,

Thank you for reporting this issue. Check out the latest updates to the example code: https://github.com/Building-ML-Pipelines/building-machine-learning-pipelines/releases/tag/examples_based_on_tfx_1.4

The issue should be fixed with the latest update. Please reopen if you run into trouble. Thank you again for reporting the issue.