chanelcolgate / hydroelectric-project

0 stars 0 forks source link

Data Validation #13

Open chanelcolgate opened 2 years ago

chanelcolgate commented 2 years ago

Description

tfdv.visualize_statistics(lhs_statistics=val_stats, rhs_statistics=train_stats, lhs_name='VAL_DATASET', rhs_name='TRAIN_DATASET')

- Anomalies can be detected using the following code:
```python
 anomalies = tfdv.validate_statistics(statistics=val_stats, schema=schema)

statistics_gen = StatisticsGen( examples=example_gen.outputs['examples'] ) context.run(statistics_gen) context.show(statistics_gen.outputs['statistics'])

- Generating our schema is just as easy as generating the statistics:
```python
from tfx.components import SchemaGen
schema_gen = SchemaGen(
    statistics=statistics_gen.outputs['statistics'],
    infer_feature_shape=True
)
context.run(schema_gen)