Some doubts about pass rate

Hi,

I would like to express my gratitude for your efforts in open-sourcing this project. Your work is highly appreciated and I believe it is very valuable for the research community.

However, I am currently facing some issues while trying to reproduce the results as described in your paper. Here are the details of the problem: I used the current version of gpt3.5-turbo-16k as an evaluator to run tooleval on the reproduction_data/chatgpt_dfs/G3_instruction you provided, and obtained a pass rate of only 17%. I would like to know if this is due to the GPT version issue or if there was a mistake in one of my steps. Could you please provide updated results with the latest version?

OpenBMB / ToolBench

Some doubts about pass rate #284