Samagra-Development / ai-tools

AI Tooling to bootstrap applications fast
43 stars 110 forks source link

Breaking question down into individual parts #290

Open ChakshuGautam opened 8 months ago

ChakshuGautam commented 8 months ago

Approaches to try out

References

  1. Paper on query rewrite
  2. Langchain on query rewrite
  3. Cohere to rewrite the query
AbhishekRP2002 commented 7 months ago

Hi @ChakshuGautam, I am interested in working on this issue. Before asking to assign it to me, I would require some clarifications from my end :

ChakshuGautam commented 7 months ago

@AbhishekRP2002 updated the description. You can start working on this with a draft PR. We can work on this collaboratively.

AbhishekRP2002 commented 7 months ago

Sure , I'll share a draft this weekend. Any medium other than Discord where we can connect and discuss?

ChakshuGautam commented 7 months ago

I'll be available on Discord. We can schedule a call from there if needed.

AbhishekRP2002 commented 7 months ago

https://allenai.github.io/Break/ This can be a good start for defining a benchmark for the given problem ?

masterismail commented 6 months ago

hi @ChakshuGautam , I was looking forward to contribute here. Since, it's also been inactive since long.

Having some doubts.

can I get sample queries/questions. With knowledge base (if it exists) to start the work ?

shrivastava95 commented 6 months ago

Microsoft ToolTalk is a relevant benchmark for assessing the ability of LLMs to call multiple tool APIs sequentially, which is sort of a superset of this problem statement. Paper link - https://arxiv.org/pdf/2311.10775.pdf

I would like to say that, in my personal experience in trying to develop a sequential tool-calling LLM which involved trying to break down queries, most open-source LLMs failed to produce good results as of November 2023. A simple one-shot prompt via GPT-4 as well as a prompting pipeline with GPT3.5 produced satisfactory results. Feel free to involve me in this if possible.

The paper also has a comprehensive list of various benchmarks that could be useful while selecting an appropriate benchmark for this issue - image