OpenBMB / ToolBench

[ICLR'24 spotlight] An open platform for training, serving, and evaluating large language model for tool learning.
https://openbmb.github.io/ToolBench/
Apache License 2.0
4.62k stars 397 forks source link

Question about preparing model predictions for model evaluation #240

Open paprika0741 opened 4 months ago

paprika0741 commented 4 months ago

Hello, thank you for your nice code. I wonder how to prepare all the model predictions for the six test subsets using your model ToolLLaMA and method DFSDT.

paprika0741 commented 4 months ago

readme中提到 “若要使用 ToolEval 评估您自己的模型和方法,首先需要为六个测试子集准备所有的模型预测”

├── /chatgpt_cot/
│  ├── /G1_instruction/
│  │  ├── /10160_CoT@1.json
│  │  └── ...
│  ├── /G1_tool/
│  │  ├── /10221_CoT@1.json
│  │  └── ...
│  ├── ...
│  ├── /G3_instruction/
│  │  ├── /10221_CoT@1.json
│  │  └── ...

那么input_query_file 应该是什么呢,在data里面没有找到六个测试子集

export TOOLBENCH_KEY=""
export OPENAI_KEY=""
export PYTHONPATH=./
python toolbench/inference/qa_pipeline.py \
    --tool_root_dir data/toolenv/tools/ \
    --backbone_model chatgpt_function \
    --openai_key $OPENAI_KEY \
    --max_observation_length 1024 \
    --method DFS_woFilter_w2 \
    --input_query_file data/test_instruction/G1_instruction.json \
    --output_answer_file chatgpt_dfs_inference_result \
    --toolbench_key $TOOLBENCH_KEY
pooruss commented 4 months ago

你好,在https://drive.google.com/drive/folders/1yBUQ732mPu-KclJnuQELEhtKakdXFc3J下载data并解压后,input_query_file 在data/test_instruction下有各个测试子集的query