All-Hands-AI / OpenHands

🙌 OpenHands: Code Less, Make More
https://all-hands.dev
MIT License
31.27k stars 3.61k forks source link

Local API or Gradio Client Support focus. #3

Closed waefrebeorn closed 5 months ago

waefrebeorn commented 6 months ago

Gradio clients that run local language models such as “OobaBooga” and allow api support should be a major consideration for the roadmap process. Creating usable model swapping with a cache functionality is feasible. I made an example chart months ago when I saw the potential in MinP greedy sampling that Kalomaze did work on being helpful for memory driven tasked recall due to the token accuracy. image

Please note that current projects like MemoryGPT allow api usage but no widespread application allows for effective model swapping or multi system offloading. It’s also important to note that a side server “chain” of cheaper machines or a GGML focused network solution could allow for more garage labs.

Current Roadblocks are memory management, non-useful hallucinations (effective hallucinations could generate better idea tokens in a agent focus), and ineffective inter model conversation solutions that are actually open source for System prompting style implementation.

The most feasible multi model solution is to allow for most elements to be cpu offloaded but for features like live training a model with a model doing RLHF being a “drop in” use that requires a GPU with enough vram for training. Unless a Traditional ram based training solution is usable with current model base such as mistral.

To summarize, a focus on using API solutions such as chatgpt or Claude will stagnate research on local language model feasibility. Creating a feasible framework for agent structures and Lora based live tuning for memory retention elements on a version based task list will most likely be the best course.

waefrebeorn commented 6 months ago

Please note that my picture example is of a Call Center agent system I designed in October 2023 that ended up not being used. The designed structure is a feasible alternative to a decision management system managed by a central query system for each “console” or emulated agent. Measuring the amount of cluster agents in the loop is my preposed measurement of scale for the complexity of the task, with the central query system being the “database model” that is consistently improved upon with base model usage being swapped out and a Lora imprint system for creating the “readiness” for being in the system with minimal overhead.

braveokafor commented 6 months ago

Hi @emangamer ,

How does a project like Ollama hold up for this use case?

waefrebeorn commented 6 months ago

Hi @emangamer ,

How does a project like Ollama hold up for this use case?

Ollama has a REST API for running and managing models.

You'd need a different project for training models, this looks to be a simple chat interface with prompt commands.

Gradio based projects have shown a marked standard in the AI space and the versatile nature of the web environment allows things like docker based google collab use, greatly increasing availability for phone users as well. As it was used it RVC voice synthesis.

braveokafor commented 6 months ago

Got it, I'll look into Gradio.

waefrebeorn commented 6 months ago

Got it, I'll look into Gradio.

If you're looking for a user client that uses Gradio, I suggest OobaBooga, Gradio is an open source webUI front end, not a AI model service. Open Devin should have the interface in Gradio.

huybery commented 5 months ago

@emangamer We're currently aiming for rapid prototyping (and won't consider using a complex framework for now), so feel free to discuss future architectural options with us at slack.