THUDM LongBench issues - Githubissues

THUDM / LongBench

[ACL 2024] LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding

MIT License

675 stars 54 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Request for Mixtral-8*7B model

#84 Aaronhuang-778 opened 1 week ago
0
请问用GPT-3.5-turbo-16K测试一遍完整的数据集需要花费多少？

#83 HelloEveryonehh opened 3 weeks ago
1
Evaluation mechanism update

#82 cizhenshi opened 3 weeks ago
1
请问文章中图2 different truncation对应的评估代码在哪里呢？

#81 HelloEveryonehh closed 3 weeks ago
2
Add HF Evalute Utils

#80 yongchanghao opened 1 month ago
0
长文关键信息位置

#79 ruoyuxie closed 4 weeks ago
2
Long context dataset

#78 nzw0301 closed 1 month ago
0
Why there is no need special token for chatglm3 when counting the tokens?

#77 condy0919 closed 2 months ago
2
Fix Grammar Error in NarrativeQA Prompt

#76 sidjha1 opened 2 months ago
0
inference with kv cache

#75 mohammadh-cerebras closed 2 months ago
0
可以测试基于OpenAI接口的模型管理框架吗，比如ollama, xinference

#74 jiusi9 opened 2 months ago
1
No initialization for the process group

#73 Mugariya opened 2 months ago
1
Some questions on the processed dataset in LongBench

#72 jiqimaoke closed 2 months ago
1
How to evaluate on llama3-8b-instruct?

#71 txchen-USTC opened 3 months ago
1
关于提升数据集测试有效性的建议

#70 wsn555 opened 4 months ago
7
Code for evaluation with GPT-3.5?

#69 RuskinManku opened 4 months ago
3
Load dataset from hf failed

#68 murphypei opened 4 months ago
4
The "anwser" for some examples in "qasper.jsonl" is strange

#67 Zcchill opened 4 months ago
6
Llama2-7B-chat-4k测试出来结果不一样

#66 PengWenChen closed 5 months ago
2
Loading local datasets with split=‘test’

#65 yichen0104 opened 5 months ago
1
Chinese Examples in MultiFieldQA-en

#64 wendywangwwt opened 6 months ago
1
请问数据集中 avg length 是单词长度/字长度还是token个数？

#63 deepindeed2022 closed 6 months ago
1
Table reproduce

#62 hzw20200301 closed 7 months ago
0
`Llama2-7B-chat-4k` on `PassageRetrieval-zh` gets `10.12`

#61 fuqichen1998 opened 8 months ago
5
Include data on which passage contains answer

#60 danielmisrael opened 8 months ago
1
chatglm3-6b-32k的中文测试结果远远低于README里的benchmark

#59 Strivin0311 closed 8 months ago
5
RuntimeError when running pred.py for Vicuna-v1.5-7B-16k

#58 fuqichen1998 closed 8 months ago
2
求问 Spearman correlation 是怎么计算的

#57 randomtutu opened 8 months ago
1
CUDA error??????

#56 xvolcano02 closed 8 months ago
2
Llama2-7B-chat-4k测试出来结果不一样

#55 slatter666 closed 8 months ago
3
Any Implementation of Mistral-7B?

#54 leeyeehoo opened 9 months ago
1
AttributeError: 'str' object has no attribute 'to'

#53 vincent507cpu closed 9 months ago
1
报错TypeError: Couldn't cast array of type list<item: string> to null

#52 xxcoco763 opened 9 months ago
1
Update retrieval/

#51 FaustLyu closed 9 months ago
0
Disable grad to avoid OOM

#50 acherstyx closed 10 months ago
0
测试13b，比如百川，1*A100（80G）会OOM

#49 lvjianxin opened 10 months ago
0
Evaluate on long context (32k,64k etc..) on 30B/70B large models

#48 CaesarWWK opened 10 months ago
5
如何评测GPT-3.5或GPT-4

#47 jing-my closed 9 months ago
3
长度外推的三种方式得到的answer竟一模一样？

#46 IT-five closed 11 months ago
0
OOM

#45 IT-five closed 11 months ago
3
单卡A100无法推理

#44 Huwei-deeplearning closed 11 months ago
3
单张A100 40G 无法运行（OOM） llama2-7b-chat-4k，但是可以运行 chatglm2-6b-32k

#43 fishiu closed 11 months ago
4
how to apply to baichuan?

#42 IT-five closed 9 months ago
1
关于评测的合理性

#41 rayleoyoung closed 11 months ago
2
Kimi-Chat 测试

#40 kunpeng199494 closed 9 months ago
1
Update support chatglm3

#39 JackKuo666 closed 12 months ago
1
关于被测试的模型

#38 pengcheng-yan closed 11 months ago
2
使用chatglm3-6b-32k 无法复现repo dureader的结果

#37 siqi13579 closed 12 months ago
4
classification_score计算得分代码有误

#36 zhangleiedu closed 12 months ago
1
pred.py中的typo

#35 ignorejjj closed 1 year ago
1