Open dcecchini opened 1 year ago
My personal takeaways:
Even though there's nothing ground breaking in the repo and paper I do think it is really interesting to have an approach in which the model is evaluated against itself.
I agree, some of the tests are very simple, but also easy to implement and fast to run. So maybe we could add like the toxicity one for a quick test without any dependency to an external library to run an ML model...
Let's make a list of what is worth to bring to nlptest and add them to the roadmap.
I just found a paper about self evaluating, would interesting to read and check if we can implement it.
https://arxiv.org/abs/2306.13651?utm_source=substack&utm_medium=email
Explode the BYOD repository for additional tests or datasets to add to nlptest.
Examples: