ShishirPatil / gorilla

Gorilla: Training and Evaluating LLMs for Function Calls (Tool Calls)
https://gorilla.cs.berkeley.edu/
Apache License 2.0
11.28k stars 951 forks source link

[feature] Add multi-turn conversational function calling category for benchmarking #442

Open Pernekhan opened 4 months ago

Pernekhan commented 4 months ago

Is the feature request related to a problem? Currently, there are no benchmarking for multi-turn conversations.

Sometimes assistant needs to ask for more information before calling the functions. For example: User: Book me a flight to San Francisco? [functions: book_flight(from, to, date) Assistant: Tell me from where you're flying from and on what date? User: From London on May 25th 2024. Tool call: book_flight(from=London, to=San Francisco, date=2024-05-25) ...

Describe the solution you'd like

I'd like a new category to be added to the existing list. python openfunctions_evaluation.py --model MODEL_NAME --test-category multiturn

Additional context

HuanzhiMao commented 4 months ago

Hi @Pernekhan,

Thanks for the suggestion. We do have plans to add a multi-turn test category in the future. We will keep this feature in mind for upcoming updates and prioritize it accordingly. Stay tuned for future releases!

Gopichandar commented 3 months ago

Hi @HuanzhiMao , Im looking for the same, is there a workaround for this curently?