ShishirPatil / gorilla

Gorilla: Training and Evaluating LLMs for Function Calls (Tool Calls)
https://gorilla.cs.berkeley.edu/
Apache License 2.0
11.28k stars 951 forks source link

[bug] Berkeley Function-Calling Leaderboard: <Issue> #411

Open lucenzhong opened 5 months ago

lucenzhong commented 5 months ago

Describe the bug I think there is a problem with the golden number of parallel function calls.

For example, on line 27 of data/gorilla_openfunctions_v1_test_parallel_function.json,

{"question": "Find details of lawsuits with case numbers '67813', '71249' filed in the New York District court for type 'Civil' and 'Criminal' cases.", "function": {"name": "court_case.find", "description": "Locate details of court cases based on specific parameters like case number and case type.", "parameters": {"type": "dict", "properties": {"location": {"type": "string", "description": "The city and court where the lawsuit is filed."}, "case_number": {"type": "array", "items": {"type": "string"}, "description": "The unique case numbers of the lawsuits."}, "case_type": {"type": "string", "enum": ["Civil", "Criminal"], "description": "Type of the court case.", "default": "Civil"}}, "required": ["location", "case_number"]}}}

The golden answer called the function 4 times.

{"court_case.find_1": {"location": ["New York District", "NY District", "New York", "New York, NY", "NY"], "case_number": ["67813"], "case_type": ["Civil", ""]}, "court_case.find_2": {"location": ["New York District", "NY District", "New York", "New York, NY", "NY"], "case_number": ["71249"], "case_type": ["Criminal"]},"court_case.find_3": {"location": ["New York District", "NY District", "New York", "New York, NY", "NY"], "case_number": ["67813"], "case_type": ["Criminal"]}, "court_case.find_4": {"location": ["New York District", "NY District", "New York", "New York, NY", "NY"], "case_number": ["71249"], "case_type": ["Civil", ""]}}

Since the case_number parameter can be array, maybe 2 times is okay for this situation. My result is as follows.

{"idx": 26, "result": [{"court_case_find": "{\"location\": \"New York District\", \"case_number\": [\"67813\", \"71249\"], \"case_type\": \"Civil\"}"}, {"court_case_find": "{\"location\": \"New York District\", \"case_number\": [\"67813\", \"71249\"], \"case_type\": \"Criminal\"}"}], "input_token_count": 518, "output_token_count": 98, "latency": 14.348066568374634}

The number of function calls is not unique because the argument is a list. This situation is common in parallel function calls.

HuanzhiMao commented 5 months ago

Hi @lucenzhong , Thanks for pointing this out. I agree that since the case_number parameter is an array, the model output could combine two function calls into one, and thus the possible answer should be more inclusive. We plan to address this issue in the next release by updating the possible answers. You are also welcome to make a PR if you would like to contribute :)

lucenzhong commented 4 months ago

@HuanzhiMao Hello, may I ask when the next release will be? The current version appears to have several issues to be fixed.

HuanzhiMao commented 4 months ago

Later this week or early next week! Sorry for the long wait. We are currently busy with the paper deadline :/