All-Hands-AI / OpenHands

🙌 OpenHands: Code Less, Make More
https://all-hands.dev
MIT License
31.25k stars 3.6k forks source link

Mentat-bot #2821

Open rezzie-rich opened 2 months ago

rezzie-rich commented 2 months ago

Mentat-bot by AbanteAI has a very impressive swe-bench lite score of 38%.

https://www.swebench.com/

Like the adoption of swe-agent into codeactagent and the hybrid codeact-swe-agent, mentat could be a great addition to OD.

Mentat-bot is partially open-source under apache-2.0

neubig commented 2 months ago

Thanks @rezzie-rich ! Would you be able to poke around and figure out how we could replicate their swe-bench scores? If we can figure that out this sounds like an awesome addition.

rezzie-rich commented 2 months ago

Thanks @rezzie-rich ! Would you be able to poke around and figure out how we could replicate their swe-bench scores? If we can figure that out this sounds like an awesome addition.

I will give it a try

rezzie-rich commented 2 months ago

@neubig, Mentat-bot, which achieved 38% on swe-bench lite, is their proprietary agent. However, I did find a blog explaining the basic structure. https://mentat.ai/blog/mentatbot-sota-coding-agent

They also have two open-source agents Mentat(Cli code assistant) & Rawdog(Generate and auto-execute Python in cli). https://github.com/AbanteAI/mentat https://docs.mentat.ai/en/latest/user/context.html https://github.com/AbanteAI/rawdog

They also have another closed-source agent called 'Mentat auto-context'. I assume it's some MoA with these agents that make up the mentat-bot. https://mentat.ai/blog/mentat-auto-context-q-a-with-large-codebases

The mentat-bot blog's architecture diagram could be translated as follows: github issue -> gather context(mentat auto-context) -> plan and edit(mentat) -> test and review(rawdog) -> submit

rezzie-rich commented 2 months ago

@xingyaoww, using CLI agents, especially rawdog could be their strength as it can run the code to test it before submitting making it more accurate. If not the whole mentat-bot but at least incorporating rawdog into codeactagent is very promising.

@enyst, their mentat auto-context agent could shed some light on memory management.

rezzie-rich commented 2 months ago

@neubig, Mentat-bot, which achieved 38% on swe-bench lite, is their proprietary agent. However, I did find a blog explaining the basic structure. https://mentat.ai/blog/mentatbot-sota-coding-agent

They also have two open-source agents Mentat(Cli code assistant) & Rawdog(Generate and auto-execute Python in cli). https://github.com/AbanteAI/mentat https://docs.mentat.ai/en/latest/user/context.html https://github.com/AbanteAI/rawdog

They also have another closed-source agent called 'Mentat auto-context'. I assume it's some MoA with these agents that make up the mentat-bot. https://mentat.ai/blog/mentat-auto-context-q-a-with-large-codebases

The mentat-bot blog's architecture diagram could be translated as follows: github issue -> gather context(mentat auto-context) -> plan and edit(mentat) -> test and review(rawdog) -> submit

I think the 'mentat auto-context' isn't a separate agent instead it's part of mentat. in that case, both 'gather context' and 'plan & edit' is performed by mentat but using separate functionalities, and rawdog is used for testing. The process is repeated until getting the correct result before submitting.

neubig commented 2 months ago

OK, awesome, thanks for the research! If someone is interested in implementing this we'd be happy to have any contributions.

Graham

rezzie-rich commented 2 months ago

@SmartManoj , i apologize in advance for my lack of technical knowledge, so please correct me if I'm wrong. In PR #2865, the problem seems to be loading the existing file in case the existing project is big. I do agree with @tobitege because if the project is massive, it will cost a lot of money just to load the project, and using local llm will take a lot of time(depending on the machine). However, i also agree with manoj that loading only selective files will result in incomplete context, giving a similar experience of using chatgpt to write code for a complex project.

Mentat seem to have a creative solution for this using their 'mentat auto-context'. It's also under the Apache 2.0 license. Maybe it is worth checking it out and maybe incorporate that as a creative solution unless you guys already figured out a better solution.

https://mentat.ai/blog/mentat-auto-context-q-a-with-large-codebases

rezzie-rich commented 2 months ago

This most likely will also help to incorporate 'rawdog' by mentat in the future, for testing selective code snippets as part of the workflow to assure the final output is correct.

SmartManoj commented 2 months ago

Mentat Auto-Context also uses all files. Right? Generates embeddings for all files in your working directory, in chunks, and stores them in a local database.

rezzie-rich commented 2 months ago

Mentat Auto-Context also uses all files. Right? Generates embeddings for all files in your working directory, in chunks, and stores them in a local database.

Yes, but instead of loading it all, it selectively loads them based on the use case.

SmartManoj commented 2 months ago

In PR #2865, the problem seems to be loading the existing file in case the existing project is big. I

It just gives the file/folder names only. It's not reading the contents. Updated the PR title. Sorry for the confusion. 🙏

rezzie-rich commented 2 months ago

Mentat Auto-Context also uses all files. Right? Generates embeddings for all files in your working directory, in chunks, and stores them in a local database.

Yes, but instead of loading it all, it selectively loads them based on the use case.

I believe it's a similar process as 'anythingLLM' where you can load a whole github repo to knowledge-base under a minute.

rezzie-rich commented 2 months ago

In PR #2865, the problem seems to be loading the existing file in case the existing project is big. I

It just gives the file/folder names only. It's not reading the contents. Updated the PR title. Sorry for the confusion. 🙏

However, OD having complete knowledge of the project is very important. Then, OD can be used to improve OD faster than ever.

SmartManoj commented 2 months ago

I believe it's a similar process as 'anythingLLM' where you can load a whole github repo to knowledge-base under a minute.

@rezzie-rich Like this RepoMap #2248?

tobitege commented 2 months ago

I believe it's a similar process as 'anythingLLM' where you can load a whole github repo to knowledge-base under a minute.

Like this RepoMap #2248?

Sounds similar, but is quite different, let Sonnet explain why: `This Python code implements a system for analyzing and mapping a code repository, which is fundamentally different from using a vector database. Let me explain the key differences and what this code does:

  1. Repository Analysis: This code analyzes the structure and content of a code repository by parsing source files, extracting information about defined and referenced identifiers (like function names, class names, etc.), and building a graph-based representation of the codebase.

  2. Tag Extraction: It uses tree-sitter and Pygments to parse source code files and extract "tags" - which are essentially identifiers and their locations in the code. These tags are cached for performance.

  3. Graph-based Ranking: The code builds a graph where nodes are files and edges represent relationships between files based on shared identifiers. It then uses PageRank algorithm to rank the importance of files and identifiers in the codebase.

  4. Context-aware Mapping: The system generates a "map" of the repository that focuses on the most relevant parts of the codebase, taking into account which files are currently being discussed (chat_files) and which identifiers have been mentioned.

  5. Token-based Sizing: The code aims to create a repository map that fits within a specified token limit, which is likely for use with a language model like GPT.

In contrast, a vector database approach would typically work as follows:

  1. Embedding Generation: Each file or code snippet would be converted into a high-dimensional vector representation (embedding) using a model trained on code.

  2. Similarity Search: When querying the codebase, you would convert the query into a vector and find the most similar vectors in the database.

  3. Retrieval: The most relevant code snippets or files would be retrieved based on vector similarity.

Key differences:

  1. Structural Understanding: This code has a deeper structural understanding of the codebase, including relationships between files and identifiers. A vector database typically doesn't capture these relationships explicitly.

  2. Ranking Method: This code uses graph-based ranking, while vector databases use vector similarity.

  3. Context Awareness: This system can adjust its representation based on the current context of the conversation. Vector databases typically don't have this level of dynamic adjustment.

  4. Token Economy: This system is designed to work within token limits, which is not typically a concern for vector databases.

  5. Language-specific Analysis: This code uses language-specific parsers to understand code structure. Vector databases often use more general text-based approaches.

In summary, this code provides a more detailed, structure-aware, and context-sensitive analysis of a codebase, tailored for use with language models, compared to the simpler but potentially more scalable approach of vector databases.`

rezzie-rich commented 2 months ago

I believe it's a similar process as 'anythingLLM' where you can load a whole github repo to knowledge-base under a minute.

Like this RepoMap #2248?

I guess, but anythingLLM uses a vector database for storing documents.

1- Vectorization: Each document is converted into high-dimensional vectors using embedding models. This process transforms the textual content into numerical representations that can be efficiently searched and indexed.

2- Storage in Vector Database: The vectors are then stored in a vector database. This allows for efficient similarity searches and retrieval of relevant information based on user queries.

mentat also uses a similar mechanism for their Auto-Context.

xingyaoww commented 2 months ago

Auto-Context sounds like a vector-based advanced version of "repo map" - i think it will be very useful! Would be excited to see them integrated!

FellowTraveler commented 1 month ago

Note: instead of making a embedding of a segment of code, it would be better to make an embedding of a summary of that segment of code. That way the semantic meaning captured in the embedding would be more relevant to that segment of code versus making an embedding of the code itself. The summary should include information on class hierarchy for that segment of code, as well as its namespace. K-nearest-matching embeddings should lead to nodes on the graph, and then retrieval should rely on subgraph traversal.

github-actions[bot] commented 1 day ago

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.