OpenBMB InfiniteBench issues

OpenBMB / InfiniteBench

Codes for the paper "∞Bench: Extending Long Context Evaluation Beyond 100K Tokens": https://arxiv.org/abs/2402.13718

MIT License

244 stars 19 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Add support for Local/More APIs

#24 rmusser01 closed 4 days ago
1
Fix kv retrieval score

#23 Wangmerlyn closed 2 weeks ago
0
fix code_debug task score computing

#22 Wangmerlyn closed 3 weeks ago
0
Mismatch for longbook_qa_eng

#21 xuandif-cmu opened 3 weeks ago
1
fix task naming error of En.Sum

#20 Wangmerlyn closed 3 weeks ago
0
Error in loading from Huggingface

#19 BenHamm opened 1 month ago
3
bug in computing scores for longdialogue_qa_eng

#18 Xianchao-Wu opened 1 month ago
1
GPT-4o

#17 karansaxena opened 2 months ago
1
Bug in Math.Calc

#16 hansjohn closed 3 months ago
1
Generating Math and Code sample

#15 kai-wen-yang closed 3 months ago
2
How to evaluate the performance of RWKV or Jamba?

#14 hijkzzz closed 4 months ago
0
Can I customize the data set length? For example, test 32k, 64k and 200k respectively

#13 hijkzzz closed 5 months ago
1
Why some data in longbook_qa_eng were modified?

#12 FranxYao closed 5 months ago
1
name 'ROUGE_SCORER' is not defined

#11 ustccyf closed 8 months ago
1
The inconsistence between in-context examples and test cases on mathcalc

#10 philipwangOvO closed 8 months ago
7
数据全是手动标注的吗？有没有存在模型生成的部分

#9 Patrick-Ni closed 8 months ago
1
About the data source

#8 guanzhchen closed 9 months ago
2
计算测试分数时报错

#7 iMountTai closed 9 months ago
5
模型支持长度与测试长度

#6 iMountTai closed 9 months ago
7
Inference time of YaRN-Mistral-7B

#5 ccclyu closed 9 months ago
1
Did you use gpt-4-32k for the evaluation？

#4 z379035389 closed 9 months ago
1
Yi-200K?

#3 yhyu13 closed 9 months ago
1
fix bug

#2 tuantuanzhang closed 9 months ago
0
KeyError when running `eval_yarn_mistral.py` on PassKey

#1 siqi13579 closed 9 months ago
1