OpenBMB / RepoAgent

An LLM-powered repository agent designed to assist developers and teams in generating documentation and understanding repositories quickly.
Apache License 2.0
284 stars 44 forks source link

Questions: tree sitter, git, ollama #60

Open magaton opened 6 months ago

magaton commented 6 months ago

Hello, interesting project and architecture. I see that the support for other programming languages is left for future. Have you considered using tree-sitter for code parsing?

Also, why did you decide to use pre-commit hooks instead of pullling git repository with a scheduler. Llama index github reader could be leveraged in that case.

Do you plan to support Ollama and if so, which of the open source models you reckon would be the best fit?

Thanks

magaton commented 5 months ago

Anyone?

Umpire2018 commented 5 months ago

I see that the support for other programming languages is left for future. Have you considered using tree-sitter for code parsing?

Hi there! Thank you for your suggestion. I will look into it. It would be great if you can share any details with me.

Umpire2018 commented 5 months ago

image

Reference: here

We applied

  1. Abstract Syntax Tree (AST) to extract all Classes and Functions within the file, including their type, name, code snippets, etc which is similar with

Tree-sitter is a parser generator tool and an incremental parsing library. It can build a concrete syntax tree for a source file and efficiently update the syntax tree as the source file is edited).

image

image
  1. Jedi to find_all_referencer of single function in repo_agent/doc_meta_info.py Line 270 .

Seems like tree-sitter is better than ast because it provides multiple programming language support.

image

Correct me if i am wrong. @LOGIC-10

magaton commented 5 months ago

Thanks for the response guys. Again, excellent work. I am on the same boat and the support for the multiple programming languages is a stopper from using your project.

As I can see you do Python AST + Jedi for the function calls. Replacing python AST with tree-sitter could bring you closer to multi-lnaguage support , but Jedi is usable only for python.

AST is only one layer and here with Jedi you want to add function calls into the picture.

But, there is a standard notion for extracting codebase semantics. It is called CPG (code property graph) and a reference implementation called Joern:

Have you maybe considered that?

Umpire2018 commented 5 months ago

AST is only one layer and here with Jedi you want to add function calls into the picture.

But, there is a standard notion for extracting codebase semantics. It is called CPG (code property graph) and a reference implementation called Joern:

I wonder if CPG have a python implementation? https://github.com/markgacoka/codepropertygraph may not be a good choice.

And the goal is to replace AST + Jedi via one or multiple library in order to acheieve multi-language support.

magaton commented 5 months ago

I am using Joern for CPG -> Neo4j, but that is scala There is also https://pypi.org/project/cpggen/ in python

Umpire2018 commented 5 months ago

AppThreat/cpggen: This repository has been archived by the owner on Jan 8, 2024. It is now read-only.

It seems that now is not a good time to introduce CPG but we will definitely consider tree sitter.

magaton commented 5 months ago

Understood, but when you use tree-sitter, maybe you can only take its CST output and use a code chunker from llama index https://docs.sweep.dev/blogs/chunking-improvements

Umpire2018 commented 5 months ago

Do you plan to support Ollama and if so, which of the open source models you reckon would be the best fit?

Seems like Ollama have provided openai-compatibility so i think support Ollama or others open source llm is not high priority.

Right now we only used Chat completions ablility.

Similar projects for reference are as follows:

  1. vllm
  2. llama-cpp-python
  3. Ollama
Major-wagh commented 5 months ago

Hello, I too wanted support for languages other then python. Does anybody know the approach or neccsessary changes to be done to the existing code repository?

biandan commented 1 month ago

openai很多地方无法使用,我也期待支持ollama

sandeshchand commented 3 weeks ago

Is there any method/approach for supporting multiple programming language to find_all_referencer of single function ?