THUNLP-MT / StableToolBench

A new tool learning benchmark aiming at well-balanced stability and reality, based on ToolBench.
https://zhichengg.github.io/stb.github.io/
Apache License 2.0
115 stars 15 forks source link

DFS.py changes causing functions to not be called with ToolLLaMa #19

Open kingb12 opened 3 months ago

kingb12 commented 3 months ago

Hi! I'm working on running ToolLLaMa against the StableToolBench server, and noticed an issue. I am executing the following:

python toolbench/inference/qa_pipeline.py \
    --tool_root_dir data_example/toolenv/tools/ \
    --backbone_model toolllama \
    --model_path ToolBench/ToolLLaMA-2-7b-v2 \
    --max_observation_length 1024 \
    --observ_compress_method truncate \
    --method DFS_woFilter_w2 \
    --input_query_file data_example/example_instructions/test_query.json \
    --output_answer_file outputs/tolllama_dfs_inference_result \
    --toolbench_key $TOOLBENCH_KEY

Here test query is just a single query in the expected format. I noticed on L119 of tool_llama_model.py, it looks like the function and arguments get parsed to a key function_call:

# react format prediction
        thought, action, action_input = react_parser(predictions)
        message = {
            "role": "assistant",
            "content": thought,
            "function_call": {
                "name": action,
                "arguments": action_input
            }
        }

But then on L232 and a few other places of DFS.py, it looks like this key is ignored, in favor of tool_calls:

# if "function_call" in new_message.keys():
if "tool_calls" in new_message.keys() and new_message["tool_calls"] != None and len(new_message["tool_calls"]) > 0:
    tool_calls = new_message["tool_calls"]
    if self.process_id == 0:
        print("number of parallel calls:",len(tool_calls))

Is this to support more calls than just one? And if so, to get this to work with ToolLLaMa, do I need to modify either code path, or just call the script with different arguments? If there's a fix that makes sense I can try to contribute it! One simple idea I could do: if function_call is present and tool_calls is not present, modify the message so that the function call becomes an element of tool_calls.

kingb12 commented 3 months ago

My test query for input_query_file, if it helps: test_query.json