CLUEbenchmark / SuperCLUE-Agent

SuperCLUE-Agent: 基于中文原生任务的Agent智能体核心能力测评基准
77 stars 2 forks source link

Will you release the benchmark dataset samples, evaluation metrics and methods? #9

Open SilasTHU opened 5 months ago

SilasTHU commented 5 months ago

Now we can only see the scores of these models, but I'm very interested in how you evaluate these agents.