dm4ml / gate

Drift detection module for machine learning pipelines.
https://dm4ml.github.io/gate/
MIT License
21 stars 2 forks source link

Partition summary for embeddings #12

Open vahuja4 opened 1 year ago

vahuja4 commented 1 year ago

Interesting approach for drift detection! Can you please tell me if the partition summary in the case of embeddings is the same as below (https://dm4ml.github.io/gate/how-it-works/) or are you taking into account other factors: coverage: The fraction of the column that has non-null values. mean: The mean of the column. p50: The median of the column. num_unique_values: The number of unique values in the column. occurrence_ratio: The count of the most frequent value divided by the total count. p95: The 95th percentile of the column.

shreyashankar commented 1 year ago

The partition summary includes the summary statistics listed above, for each dimension of the embeddings!