Vicuna & Mistral Results Discussion

Great work!

I found that the results of ReACT & DFSDT of vicuna and alpaca in your paper are both 0. Is this because the small model cannot understand the meaning of function call and thus cannot give reasonable and valid parameters to call API? I tried Mistral-7B-Instruct-v0.2 using your code and found that ReACT & DFSDT also failed to give a reasonable response.

Is this reasonable? Can it be said that for small models, because function call information is not included during training, it is difficult for them to understand and use the tools?

This is the code from llama_model.py

def parse(self,functions,process_id,**args):
        conv = get_conversation_template(self.template)
        if self.template == "tool-llama":
            roles = {"human": conv.roles[0], "gpt": conv.roles[1]}
        elif self.template == "tool-llama-single-round" or self.template == "tool-llama-multi-rounds":
            roles = {"system": conv.roles[0], "user": conv.roles[1], "function": conv.roles[2], "assistant": conv.roles[3]}

OpenBMB / ToolBench

Vicuna & Mistral Results Discussion #223