issues
search
booydar
/
babilong
BABILong is a benchmark for LLM evaluation using the needle-in-a-haystack approach.
Apache License 2.0
141
stars
16
forks
source link
evaluation of popular models on BABILong
#2
Closed
yurakuratov
closed
3 months ago
yurakuratov
commented
3 months ago
updated README with new results
scripts for evaluation of various LLMs on BABILong
links to BABILong evaluation sets hosted on HF datasets