Open-Multi-Modal-Personal-Assistant / OpenMMPA

Open Multi-Modal Personal Assistant
MIT License
4 stars 1 forks source link

Introduce ReAct #5

Open MrCsabaToth opened 4 months ago

MrCsabaToth commented 4 months ago

Now that multiple function calling works (although with some quirks like https://github.com/google-gemini/generative-ai-dart/issues/194, or if I stuff more then a number of functions it loses sense of some of them), it'll be an interesting task to introduce ReAct in concert with it.

We need a prompt which keeps the ReAct loop, but encourages the native function execution.

The problem is that the original way ReAct works overlaps with the function calling capabilities. My problem is let's say a questions like "What's the weather today" would involve two function calls: one for determining the current location, and the other is to call the weather API with that location. The modern function call capable models are able to conclude to these two calls on their own without ReAct. ReAct without function calls would explicitly list these steps. The problem is that even today a ReAct prompt would result in explicit function calling plans, which would interfere with native function calling and parameter substitution.

Native function execution I'd expect could happen at the action phase, even multiple functions.

References:

Function calling without ReAct:

ReAct original (without function calling):

Forums, articles, repos; OpenAI, Pinecone:

Gemini related examples:

havkerboi123 commented 4 months ago

Can I work on this?

MrCsabaToth commented 4 months ago

Yes, it'd be great to have more people on board! It'd be good to talk, I'm on Discord, and other places. I'll actively work on issues as well. With ReAct particularly the finicky thing is that the pure ReAct is interfering with the newer Function calling capabilities. The best starting point could be the https://github.com/google-gemini/cookbook/blob/main/examples/Search_Wikipedia_using_ReAct.ipynb example, however I have many more tools.

The main goal is to submit the project to the https://ai.google.dev/competition Right now the project is Gemini oriented (using Chirp for STT and Google TTS if someone configures to not use the native Android STT / TTS), however in the future it'd be good to be LLM agnostic.

The effort got started off on the lines of the Humane AI Pin and the Rabbit R1, but as we know those are essentially also Android apps on embedded devices. This project can be used on Android phones, or I actually purchased a FAW (Full Android Watch) on AliExpress, there are good enough ones for $50 - that size kinda competes with an AI pin. On top of that this is Flutter, so it has even potential to run on an iOS device.

So let's chat, let me know what chat platform is best for you. On Discord I'm present in both the Gemini Meetup or the Google Developer Community, and numerous other Gen AI workspaces, like Meshy, Pika, Vectara, Flutter Dev, lablab.ai, Devpost. Also on Slack in Weaviate, ODSC Global, Flutter Community, Feats, Tecton, AICamp, ...

MrCsabaToth commented 4 months ago

I start to work on the Vector DB and history side of things. There will be code churn. The test code coverage is neglected right now, will catch up later.

havkerboi123 commented 4 months ago

Seems fun , lets connect on discord! @mehwz#4396

MrCsabaToth commented 4 months ago

I tried to send a friend request to mehwz#4396 but it didn't stick, then to mehwz, that request says mhmd. My username is @mrcsabatoth, "Originally known as @MrCsabaToth#8416".

MrCsabaToth commented 2 months ago

Food for thought: https://www.marktechpost.com/2024/09/22/chain-of-thought-cot-prompting-a-comprehensive-analysis-reveals-limited-effectiveness-beyond-math-and-symbolic-reasoning/

MrCsabaToth commented 2 months ago

"Existing research includes various approaches to enhance LLMs’ reasoning capabilities beyond CoT. One of the approaches is Long-horizon planning which has emerged as a promising area in tasks like complex decision-making sequences. However, the debate on CoT’s effectiveness in planning tasks remains divided, with studies supporting and questioning its utility. Alternative methods like tree-of-thought have been developed to address planning challenges, resulting in more complex systems. Theoretical research indicates that CoT augments Transformers, opening the door for more advanced CoT variants. Recent work on internalizing CoT also suggests that the full potential of explicit intermediate token generation has yet to be realized."

Now I know that ReAct is a little step further from the chain of thought, but they are in the same ballpark. That essentially also includes those internal opaque logic which makes function calling, multi-turn multi-step function calling in Gemini or other SOTA and open source models