Open magaton opened 8 months ago
Anyone?
I see that the support for other programming languages is left for future. Have you considered using tree-sitter for code parsing?
Hi there! Thank you for your suggestion. I will look into it. It would be great if you can share any details with me.
Reference: here
We applied
Tree-sitter is a parser generator tool and an incremental parsing library. It can build a concrete syntax tree for a source file and efficiently update the syntax tree as the source file is edited).
repo_agent/doc_meta_info.py Line 270
.Seems like tree-sitter is better than ast because it provides multiple programming language support.
Correct me if i am wrong. @LOGIC-10
Thanks for the response guys. Again, excellent work. I am on the same boat and the support for the multiple programming languages is a stopper from using your project.
As I can see you do Python AST + Jedi for the function calls. Replacing python AST with tree-sitter could bring you closer to multi-lnaguage support , but Jedi is usable only for python.
AST is only one layer and here with Jedi you want to add function calls into the picture.
But, there is a standard notion for extracting codebase semantics. It is called CPG (code property graph) and a reference implementation called Joern:
Have you maybe considered that?
AST is only one layer and here with Jedi you want to add function calls into the picture.
But, there is a standard notion for extracting codebase semantics. It is called CPG (code property graph) and a reference implementation called Joern:
I wonder if CPG have a python implementation? https://github.com/markgacoka/codepropertygraph may not be a good choice.
And the goal is to replace AST + Jedi via one or multiple library in order to acheieve multi-language support.
I am using Joern for CPG -> Neo4j, but that is scala There is also https://pypi.org/project/cpggen/ in python
AppThreat/cpggen: This repository has been archived by the owner on Jan 8, 2024. It is now read-only.
It seems that now is not a good time to introduce CPG but we will definitely consider tree sitter.
Understood, but when you use tree-sitter, maybe you can only take its CST output and use a code chunker from llama index https://docs.sweep.dev/blogs/chunking-improvements
Do you plan to support Ollama and if so, which of the open source models you reckon would be the best fit?
Seems like Ollama
have provided openai-compatibility so i think support Ollama or others open source llm is not high priority.
Right now we only used Chat completions
ablility.
Similar projects for reference are as follows:
Hello, I too wanted support for languages other then python. Does anybody know the approach or neccsessary changes to be done to the existing code repository?
openai很多地方无法使用,我也期待支持ollama
Is there any method/approach for supporting multiple programming language to find_all_referencer of single function ?
Hello, interesting project and architecture. I see that the support for other programming languages is left for future. Have you considered using tree-sitter for code parsing?
Also, why did you decide to use pre-commit hooks instead of pullling git repository with a scheduler. Llama index github reader could be leveraged in that case.
Do you plan to support Ollama and if so, which of the open source models you reckon would be the best fit?
Thanks