issues
search
THUDM
/
LongBench
[ACL 2024] LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding
MIT License
675
stars
54
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Request for Mixtral-8*7B model
#84
Aaronhuang-778
opened
1 week ago
0
请问用GPT-3.5-turbo-16K测试一遍完整的数据集需要花费多少?
#83
HelloEveryonehh
opened
3 weeks ago
1
Evaluation mechanism update
#82
cizhenshi
opened
3 weeks ago
1
请问文章中图2 different truncation对应的评估代码在哪里呢?
#81
HelloEveryonehh
closed
3 weeks ago
2
Add HF Evalute Utils
#80
yongchanghao
opened
1 month ago
0
长文关键信息位置
#79
ruoyuxie
closed
4 weeks ago
2
Long context dataset
#78
nzw0301
closed
1 month ago
0
Why there is no need special token for chatglm3 when counting the tokens?
#77
condy0919
closed
2 months ago
2
Fix Grammar Error in NarrativeQA Prompt
#76
sidjha1
opened
2 months ago
0
inference with kv cache
#75
mohammadh-cerebras
closed
2 months ago
0
可以测试基于OpenAI接口的模型管理框架吗,比如ollama, xinference
#74
jiusi9
opened
2 months ago
1
No initialization for the process group
#73
Mugariya
opened
2 months ago
1
Some questions on the processed dataset in LongBench
#72
jiqimaoke
closed
2 months ago
1
How to evaluate on llama3-8b-instruct?
#71
txchen-USTC
opened
3 months ago
1
关于提升数据集测试有效性的建议
#70
wsn555
opened
4 months ago
7
Code for evaluation with GPT-3.5?
#69
RuskinManku
opened
4 months ago
3
Load dataset from hf failed
#68
murphypei
opened
4 months ago
4
The "anwser" for some examples in "qasper.jsonl" is strange
#67
Zcchill
opened
4 months ago
6
Llama2-7B-chat-4k测试出来结果不一样
#66
PengWenChen
closed
5 months ago
2
Loading local datasets with split=‘test’
#65
yichen0104
opened
5 months ago
1
Chinese Examples in MultiFieldQA-en
#64
wendywangwwt
opened
6 months ago
1
请问数据集中 avg length 是单词长度/字长度还是token个数?
#63
deepindeed2022
closed
6 months ago
1
Table reproduce
#62
hzw20200301
closed
7 months ago
0
`Llama2-7B-chat-4k` on `PassageRetrieval-zh` gets `10.12`
#61
fuqichen1998
opened
8 months ago
5
Include data on which passage contains answer
#60
danielmisrael
opened
8 months ago
1
chatglm3-6b-32k的中文测试结果远远低于README里的benchmark
#59
Strivin0311
closed
8 months ago
5
RuntimeError when running pred.py for Vicuna-v1.5-7B-16k
#58
fuqichen1998
closed
8 months ago
2
求问 Spearman correlation 是怎么计算的
#57
randomtutu
opened
8 months ago
1
CUDA error??????
#56
xvolcano02
closed
8 months ago
2
Llama2-7B-chat-4k测试出来结果不一样
#55
slatter666
closed
8 months ago
3
Any Implementation of Mistral-7B?
#54
leeyeehoo
opened
9 months ago
1
AttributeError: 'str' object has no attribute 'to'
#53
vincent507cpu
closed
9 months ago
1
报错TypeError: Couldn't cast array of type list<item: string> to null
#52
xxcoco763
opened
9 months ago
1
Update retrieval/
#51
FaustLyu
closed
9 months ago
0
Disable grad to avoid OOM
#50
acherstyx
closed
10 months ago
0
测试13b,比如百川,1*A100(80G)会OOM
#49
lvjianxin
opened
10 months ago
0
Evaluate on long context (32k,64k etc..) on 30B/70B large models
#48
CaesarWWK
opened
10 months ago
5
如何评测GPT-3.5或GPT-4
#47
jing-my
closed
9 months ago
3
长度外推的三种方式得到的answer竟一模一样?
#46
IT-five
closed
11 months ago
0
OOM
#45
IT-five
closed
11 months ago
3
单卡A100无法推理
#44
Huwei-deeplearning
closed
11 months ago
3
单张A100 40G 无法运行(OOM) llama2-7b-chat-4k,但是可以运行 chatglm2-6b-32k
#43
fishiu
closed
11 months ago
4
how to apply to baichuan?
#42
IT-five
closed
9 months ago
1
关于评测的合理性
#41
rayleoyoung
closed
11 months ago
2
Kimi-Chat 测试
#40
kunpeng199494
closed
9 months ago
1
Update support chatglm3
#39
JackKuo666
closed
12 months ago
1
关于被测试的模型
#38
pengcheng-yan
closed
11 months ago
2
使用chatglm3-6b-32k 无法复现repo dureader的结果
#37
siqi13579
closed
12 months ago
4
classification_score计算得分代码有误
#36
zhangleiedu
closed
12 months ago
1
pred.py中的typo
#35
ignorejjj
closed
1 year ago
1
Next