issues
search
booydar
/
babilong
BABILong is a benchmark for LLM evaluation using the needle-in-a-haystack approach.
Apache License 2.0
141
stars
16
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Clarify GPT-4 model
#4
rodion-m
closed
1 month ago
1
Add Claude and Google models into benchmark
#3
rodion-m
opened
3 months ago
4
evaluation of popular models on BABILong
#2
yurakuratov
closed
3 months ago
0
how to show the heatmap on Figure 10 in the paper?
#1
DavideHe
opened
6 months ago
4