OpenBMB / ToolBench

[ICLR'24 spotlight] An open platform for training, serving, and evaluating large language model for tool learning.
https://openbmb.github.io/ToolBench/
Apache License 2.0
4.6k stars 397 forks source link

Some doubts about pass rate #284

Open quchangle1 opened 2 weeks ago

quchangle1 commented 2 weeks ago

Hi,

I would like to express my gratitude for your efforts in open-sourcing this project. Your work is highly appreciated and I believe it is very valuable for the research community.

However, I am currently facing some issues while trying to reproduce the results as described in your paper. Here are the details of the problem: I used the current version of gpt3.5-turbo-16k as an evaluator to run tooleval on the reproduction_data/chatgpt_dfs/G3_instruction you provided, and obtained a pass rate of only 17%. I would like to know if this is due to the GPT version issue or if there was a mistake in one of my steps. Could you please provide updated results with the latest version?