ShishirPatil / gorilla

Gorilla: Training and Evaluating LLMs for Function Calls (Tool Calls)
https://gorilla.cs.berkeley.edu/
Apache License 2.0
11.18k stars 920 forks source link

Possible answer for "parallel_function_29" #550

Closed lucenzhong closed 1 month ago

lucenzhong commented 1 month ago

https://github.com/ShishirPatil/gorilla/blob/f1417a8092540302257f7030aaa1d6ba163ac363/berkeley-function-call-leaderboard/data/possible_answer/gorilla_openfunctions_v1_test_parallel_function.json#L30

I think number = 0 can be omitted, because it will not affect the calculation result of waste. Maybe this answer is also correct:

{"id": "parallel_function_29", "ground_truth": {"waste_calculation.calculate_1": {"population": [{"adults": [2], "children": [2]}], "location": ["Los Angeles", "Los Angeles, CA", "LA"]}, "waste_calculation.calculate_2": {"population": [{"singles": [1]}], "location": ["New York", "New York, NY", "NY", "New York City", "NYC"]}}} 
lucenzhong commented 1 month ago

Another issue case

{"id": "parallel_function_89", "ground_truth": {"get_directions 1": {"start_location": ["San Francisco", "SF"], "end_location": ["Palo Alto"], "route_type": ["fastest"]}, "get_directions 2": {"start_location": ["Palo Alto"], "end_location": ["Golden Gate Bridge in San Francisco", "Golden Gate Bridge, San Francisco", "Golden Gate Bridge"], "route_type": ["scenic"]}, "get_directions 3": {"start_location": ["Golden Gate Bridge in San Francisco", "Golden Gate Bridge, San Francisco", "'Golden Gate Bridge"], "end_location": ["San Francisco", "SF"], "route_type": ["fastest"]}}}

The correct answer has an extra ' for "Golden Gate Bridge"

"get_directions 3": {"start_location": ["Golden Gate Bridge in San Francisco", "Golden Gate Bridge, San Francisco", "Golden Gate Bridge"], "end_location": ["San Francisco", "SF"], "route_type": ["fastest"]}
HuanzhiMao commented 1 month ago

Hi @lucenzhong,

Regarding id parallel_function_29: The parameter description for population is

"population": {
    "type": "dict",
    "description": "The description of population. 'adults' is the number of adults in the household. 'children' is the number of children. 'singles' is the number of single adults living alone.",
    "required": ["adults", "children", "singles"],
}

All three fields in the dictionary are required by the function doc. Even though they might not affect the function execution result, since this is evaluation by AST (not evaluation by execution), the model answer cannot omit them.

Regarding id parallel_function_89: Good catch. I will fix it soon.