OpenBMB / ToolBench

[ICLR'24 spotlight] An open platform for training, serving, and evaluating large language model for tool learning.
https://openbmb.github.io/ToolBench/
Apache License 2.0
4.62k stars 397 forks source link

Vicuna & Mistral Results Discussion #223

Open siyuyuan opened 6 months ago

siyuyuan commented 6 months ago

Great work!

I found that the results of ReACT & DFSDT of vicuna and alpaca in your paper are both 0. Is this because the small model cannot understand the meaning of function call and thus cannot give reasonable and valid parameters to call API? I tried Mistral-7B-Instruct-v0.2 using your code and found that ReACT & DFSDT also failed to give a reasonable response.

Is this reasonable? Can it be said that for small models, because function call information is not included during training, it is difficult for them to understand and use the tools?

This is the code from llama_model.py

def parse(self,functions,process_id,**args):
        conv = get_conversation_template(self.template)
        if self.template == "tool-llama":
            roles = {"human": conv.roles[0], "gpt": conv.roles[1]}
        elif self.template == "tool-llama-single-round" or self.template == "tool-llama-multi-rounds":
            roles = {"system": conv.roles[0], "user": conv.roles[1], "function": conv.roles[2], "assistant": conv.roles[3]}