Open jsharf opened 7 months ago
Hey! Awesome post and request
Couple things:
We are trying to have the repo separate tests from providers from evaluators, other than that. No style guide.
Contributions are very welcome and we'll be quick with feedback
More context here too https://twitter.com/GregKamradt/status/1772491996063526971
I really like this kind of benchmark. It would be interesting to make generalized versions of this, where there are a variable number of needles inserted. These could be unrelated independent needles, or they could be related. For example you could imagine 4 needles:
A implies B B implies C, D D implies E. B is true
Then you could test the "related" needles, to ensure that all of them were detected and the relationship is understood. (What might A be? What about D?)
Curious what you think about this. If you're interested in a feature like this and willing to accept a pull request, I could find the time to try implementing it. If you have a style guide preference or anything like that, please let me know.