OpenBMB / ToolBench

[ICLR'24 spotlight] An open platform for training, serving, and evaluating large language model for tool learning.
https://openbmb.github.io/ToolBench/
Apache License 2.0
4.77k stars 402 forks source link

Pass Rate Inquiries #206

Open JungDongwon opened 10 months ago

JungDongwon commented 10 months ago

Hi, I am running tooleval to calculate pass rate using my custom API retriever. I am currently feeding APIs retrieved from my retriever, not the ground truth api list. By looking at the evaluation results json file, I noticed that if the task is not solvable, then I get high pass rate for that example. Why is that?

Screenshot 2023-11-24 at 18 57 04
pooruss commented 9 months ago

Hi, because when the provided apis are not enough to solve the query (not solvable), we lower the standards for the model to pass and accept answers like 'i'm sorry, but i can't complete the query due to...', as long as it has tried as many tools as it can. But in those solvable queries, refusal answers can't pass. So pass rate on unsolvable queries might be higher than that on the solvable queries.