Berkley Gorilla OpenFunctions

allenporter / home-assistant-datasets

This package is a collection of datasets for evaluating AI Models in the context of Home Assistant.

https://allenporter.github.io/home-assistant-datasets

33 stars 2 forks source link

Berkley Gorilla OpenFunctions #31

Open super-qua opened 3 months ago

super-qua commented 3 months ago

Hi 👋

as I was doing some research regarding xLAM, I stumbled upon the Berkley Gorilla initiative.

they have their own function calling model with OpenFunctions and a function calling leaderboard. there is also an API Zoo that is maintained.

you must probably saw it already, but if not, it might be worth checking it out.

allenporter commented 3 months ago

Hi, yeah I believe I may have read about that in the xlam paper.

Did you have any particular opportunity you wanted to point out or just sharing?

Generally I'd love to move this to a standard benchmark or eval framework if possible, but have not explored this yet.

super-qua commented 3 months ago

yeah that sounds great.

it's really more about sharing. I have yet to try out OpenFunctions model and see if it works with home assistant. the leader board from Berkley might also be a good inspiration for which models to try out.

but as this is more about information sharing, feel free to close this issue :)

allenporter commented 2 months ago

I reviewed the dataset in more detail and my takeaways are:

the problems overall are trivial with respect to context size (e.g. much smaller than a home assistant home), usually looking at a single tool
look at more advanced cases like multi-tool call or parallel tool calling which are interesting

super-qua commented 2 months ago

yeah the multi-tool call/selection could enable some interesting use cases and improve the tool calling overall.

but also (at a later stage) the irrelevance detection. what I am struggling with currently (in my manual unstructured tests) is the LLM trying to answer most requests with tool calling, even if it is a general question.

did you see that they released a new dataset based on actual user queries? https://gorilla.cs.berkeley.edu/blogs/12_bfcl_v2_live.html

allenporter commented 2 months ago

Are you using llama 3.1 and home assistant? If so: (1) The new context size setting may help (2) I've added notes here https://www.home-assistant.io/integrations/ollama/#controlling-home-assistant about that issue generally

super-qua commented 2 months ago

(1) The new context size setting may help

I had originally set up a Modelfile with a larger context window, but this change makes it way more convenient to test different setups 👍

(2) I've added notes here https://www.home-assistant.io/integrations/ollama/#controlling-home-assistant about that issue generally

thanks, I didn't know about that. This gave me an idea. As I would like to be able to use all of it together on my S3 Box, I revisited my initial setup with an automation for assist, passing the request to the Ollama conversation agent if a general question is asked. Turned the automation into a script and made it available to the Smart Home LLM to answer general questions. Seems to be working so far, but will have to perform more tests.