dm4ml / gate

Drift detection module for machine learning pipelines.
https://dm4ml.github.io/gate/
MIT License
21 stars 2 forks source link

Revamp embeddings clustering #10

Open shreyashankar opened 1 year ago

shreyashankar commented 1 year ago

Currently, there are a fixed number of clusters of embeddings identified per partition. We want to:

  1. Have the number of clusters be dynamic (use GATE's PCA method to determine the number of clusters)
  2. Come up with one-sentence summaries for each cluster, for interpretability. We can probably use an LLM for this.