Open ChakshuGautam opened 8 months ago
Hi @ChakshuGautam, I am interested in working on this issue. Before asking to assign it to me, I would require some clarifications from my end :
@AbhishekRP2002 updated the description. You can start working on this with a draft PR. We can work on this collaboratively.
Sure , I'll share a draft this weekend. Any medium other than Discord where we can connect and discuss?
I'll be available on Discord. We can schedule a call from there if needed.
https://allenai.github.io/Break/ This can be a good start for defining a benchmark for the given problem ?
hi @ChakshuGautam , I was looking forward to contribute here. Since, it's also been inactive since long.
Having some doubts.
is there a knowledge base for this ?
- Recursively break-down the post into smaller questions/directives
what does "post" mean , what would be the source of input queries. ?
can I get sample queries/questions. With knowledge base (if it exists) to start the work ?
Microsoft ToolTalk is a relevant benchmark for assessing the ability of LLMs to call multiple tool APIs sequentially, which is sort of a superset of this problem statement. Paper link - https://arxiv.org/pdf/2311.10775.pdf
I would like to say that, in my personal experience in trying to develop a sequential tool-calling LLM which involved trying to break down queries, most open-source LLMs failed to produce good results as of November 2023. A simple one-shot prompt via GPT-4 as well as a prompting pipeline with GPT3.5 produced satisfactory results. Feel free to involve me in this if possible.
The paper also has a comprehensive list of various benchmarks that could be useful while selecting an appropriate benchmark for this issue -
Approaches to try out
References