SilasMarvin / lsp-ai

LSP-AI is an open-source language server that serves as a backend for AI-powered functionality, designed to assist and empower software engineers, not replace them.
MIT License
2.07k stars 71 forks source link

Idea: Crawl docs of dependencies + RAG #73

Open Boscop opened 2 days ago

Boscop commented 2 days ago

I'm not sure if any of the available AI coding assistants do this (if you know, please tell me), but this is the main feature I'm missing: Ideally it should crawl the docs of dependencies of the project and use this info when generating code. This allows it to better respond to user requests because the user uses natural language that's mostly found in documentation of symbols, not in the symbols themselves. Too often these coding assistants hallucinate fake symbols because they were trained on outdated docs (e.g. ChatGPT and Claude always use outdated crate versions if you tell them to generate Rust code). If they could read the docs that wouldn't happen :) This feature would open the door to really low-code coding, just telling the LLM what to do and what to change!

In practice, how it would look like: E.g. lets say I have a Rust project with many deps and transitive deps. I want to build a complex component using multiple symbols from different deps. The assistant in the background inspects Cargo.toml to identify the deps, then crawls the docs.rs of the deps, adds these as vector embeddings etc. In addition, it also detects from the docs (or from rust-analyzer) which transitive deps are re-exported, and crawls their docs too. (Similar with other languages that have conventions around docs.) Then, user prompts to the LLM via the LSP adapter use this contextual info, to generate code that uses the correct names/types/shapes/signatures.

SilasMarvin commented 1 day ago

This could be pretty cool and we actually already have most of the hard part written: https://github.com/SilasMarvin/lsp-ai/blob/main/crates/lsp-ai/src/memory_backends/vector_store.rs

We would need to write some configuration for which directories to crawl and maybe some way to watch for changes to them?

Feel free to take a swing at it!