Closed williamzebrowskI closed 3 months ago
Hey @williamzebrowskI . Thanks for filing. What other metrics would you like to use? Our README lists 6 metrics (including the AnswerSimilarityMetric) and in addition to those we have ~15 more.
Go here for the readme.
And from the root of the repo go to tonic_validate/metrics to see additional metrics. Most of the other metrics are self explanatory but I'm happy to provide any clarifications if you need.
When you know which metrics you wish to use just pass them in as a list to the ValidateScorer constructor, e.g.:
from tonic_validate import ValidateScorer
from tonic_validate.metrics import AnswerConsistencyMetric, AnswerSimilarityMetric
scorer = ValidateScorer([
AnswerConsistencyMetric(),
AugmentationAccuracyMetric()
], model_evaluator="gpt-3.5-turbo")
If all of this makes sense and you are good just let me know so we can close out the issue.
Closing for now but happy to re-open if you still have questions @williamzebrowskI
This is not an issue but a request for further examples associated with testing OpenAI Assistants.
Currently, there is one example which compares a CustomGPT vs Assistant but I was looking for more if possible. The example shows the use of AnswerSimilarityMetric and curious if there are anymore examples you can share on Validating an Assistants performance using other metrics.
Thanks!