AnthusAI / Plexus

An orchestration system for managing text classification at scale using LLMs, ML models, and NLP.
https://anthusai.github.io/Plexus/
MIT License
4 stars 0 forks source link

Feature request: We need a way to evaluate the distribution of score results when we have no labels #7

Open endymion opened 1 month ago

endymion commented 1 month ago

We can use plexus evaluate accuracy to compare the predictions for a score against ground-truth labels. But when we're creating new scores with no historical labels we have no way to evaluate the score results.

In some cases we do know a little about the expected distribution of score results even if we don't have labels. For example, with a "Do-Not-Call Requested" score, we know that we expect score results to almost always be "No", with very few "Yes" results.

In situations like that, we need to be able to run a series of predictions on a random set of samples and then generate evaluation metrics and visualizations based on the distribution of the predictions, rather than comparing them against human labels.