issues
search
CLUEbenchmark
/
SuperCLUE
SuperCLUE: 中文通用大模型综合性基准 | A Benchmark for Foundation Models in Chinese
https://www.superclueai.com
3.02k
stars
97
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
断更了???
#48
yiliangfang
opened
3 weeks ago
1
请问评测的原理是什么呀,是人工打分的吗,如果是客观题是直接比较返回的答案的字符串,主观题是人工评判答案吗
#47
starplatinum3
opened
4 months ago
1
工具的评测是什么含义? 是Function calling么,如果没有请添加此能力的评测。
#46
goqw
opened
7 months ago
1
Claude3有评测吗?
#45
Pancat007
opened
8 months ago
0
请问如何对自己做的大模型使用该指标进行测评呢?
#44
AWangji
opened
8 months ago
2
What are expected to submit for the leaderboard integration?
#43
zhimin-z
closed
7 months ago
1
请问一月榜单呢
#42
kindle939393
opened
10 months ago
2
GPT4-Turbo is missing from the general leaderboard
#41
zhimin-z
closed
10 months ago
1
咨询一下,从测评报告来看,SuperCLUE是采用自动化方式的客观评估,是否可提供针对某一模型的可实际运行的自动化评测的python样例代码(api调用或者web)?
#40
Romanzhang2024
opened
11 months ago
0
Does it indicate using 5 shots for evaluation?
#39
zhimin-z
closed
10 months ago
1
Where to download the benchmark dataset?
#38
zhimin-z
opened
1 year ago
0
How to calculate the metrics from the table in the paper to the leaderboard?
#37
zhimin-z
opened
1 year ago
1
大模型升级方式
#36
lukeup
opened
1 year ago
0
想问下 角色扮演 benchmark是怎么进行的
#35
xealml
closed
1 year ago
0
Where to locate the SuperCLUE-LYB leaderboards?
#34
zhimin-z
opened
1 year ago
0
能否增加翻译的评估排名
#33
lx0126z
opened
1 year ago
0
任务规划和工具使用的评价标准是什么样的?
#32
heibaidaolx123
opened
1 year ago
1
c-eval是真的离谱,希望superclue能更新的稍微快一点,比如1-2周更新一次
#31
iammeizu
opened
1 year ago
2
anthropic拼错了
#30
JerryJiang12923
opened
1 year ago
0
求教一下 逻辑与推理 具体指哪方面? 比如 "郭德纲2岁会看报,xxxx" ,请问郭德纲3岁会看书吗? 这个属于推理还是语义理解能力??
#29
ArtificialZeng
opened
1 year ago
0
请问可以把vicuna-33B模型加入评测吗?
#28
Mr-wang2016
opened
1 year ago
0
测评时如何与标准答案进行匹配
#27
Starry-Hu
opened
1 year ago
0
数据集开源吗?可以在哪里下载呢
#26
vanshaw2017
opened
1 year ago
3
关于prompt设计的问题
#25
lrs1353281004
opened
1 year ago
1
排名变化的原因是什么?
#24
zhaojiawen-coding
opened
1 year ago
1
test the 智源大模型吧
#23
forkyguo
opened
1 year ago
3
阿里的通义千问没有吗?
#22
Pancat009
closed
10 months ago
2
这里"idea-jiangzhiya"应该是"idea-jiangziya"吧?
#21
ilongshan
opened
1 year ago
1
没有文心一言吗
#20
p81sunshine
closed
1 year ago
1
Clarify which "Claude" is benchmarked?
#19
jekbradbury
opened
1 year ago
1
可以在superclue上测试自己的模型吗?
#18
guozhiyao
opened
1 year ago
2
什么时候回公开测试数据集?
#17
wangrui6
opened
1 year ago
1
建议补全人类的“专业能力”数据
#16
Triang-jyed-driung
opened
1 year ago
1
人类的数值怎么来的?
#15
So0ni
closed
1 year ago
3
置信度
#14
littlepan0413
closed
1 year ago
1
公开评测集和评测标准
#13
plmsmile
closed
1 year ago
2
开始搞手机测评榜那一套了?GPT4对应苹果,国产大模型对应华米OV
#12
ZhuGeRoastedFish
opened
1 year ago
3
测评结果为什么全是整数?
#11
ltz0120
opened
1 year ago
1
这个评测的参考价值
#10
liuyajun52
closed
1 year ago
2
作为一个测评榜,建议参考Chinese-LLaMA-Alpaca进行适度的测评说明和公开
#9
shm007g
opened
1 year ago
1
评测数据客观公正很重要
#8
shichengustc
opened
1 year ago
3
单项能力有多少道题目啊
#7
leonall
opened
1 year ago
2
这个superCLUE 有毒性和偏见等方面的评测吗
#6
devinbai
opened
1 year ago
3
生成与创作如何用选择题的形式测试的?
#5
Howardqlz
closed
1 year ago
4
该如何引用你们的工作?
#4
MikeGu721
closed
1 year ago
1
我个人使用后的感受,星火大模型是真的不如文心一言。。
#3
MysteryMulberry
opened
1 year ago
8
超200人了,求拉群
#2
dinglei8908
closed
1 year ago
1
感谢徐亮老师团队的工作~关于评测细节 有一些疑问咨询下
#1
lrs1353281004
opened
1 year ago
5