Aider-AI / aider

aider is AI pair programming in your terminal
https://aider.chat/
Apache License 2.0
21.46k stars 2k forks source link

Feature Request: Learn from Documentation #69

Closed jasoncmcg closed 6 months ago

jasoncmcg commented 1 year ago

Feature Request: Learn from Documentation Example implementation:

Use ChromaDB (or something along those lines) to create and store researched documentation on a newer library (or one that is unknown to GPT), like langchain or EbitenUI.

The user could say /research langchain Or maybe /research https://pkg.go.dev/github.com/ebitenui/ebitenui

yozef commented 1 year ago

Exactly the use case I'm looking for. The LLM (OpenAI GPT-4) is not aware of some newer libraries, and their usage.

rjohnson318 commented 1 year ago

Something like this would be nice to get that documentation into a place where aider can access it. https://github.com/StanGirard/quivr

jasoncmcg commented 1 year ago

Something like this would be nice to get that documentation into a place where aider can access it. https://github.com/StanGirard/quivr

I liked that until I saw it require a Supabase account. ChromaDb can be run locally on business-class machines.

rjohnson318 commented 1 year ago

true, maybe we can replace Supabase with ChromaDb...?

c-p-b commented 1 year ago

Similar request, but for source code - I work a lot on a couple source code libraries that were released post 2021, and it would be interesting to just point aider at something derived from a git/github repository.

This can be somewhat accomplished right now with open source software by cloning the git repository and pointing at files locally but the ergonomics, especially around token limitations, leave a bit to be desired.

If that feels separate enough to be its own issue, happy to create one for that

TheSnowGuru commented 1 year ago

how about the new gorilla? https://github.com/ShishirPatil/gorilla

fahmad91 commented 1 year ago

Are there any plans to implement this anytime soon? This feature will be extremely handy especially as I've found aider seems to implement outdated or deprecated code when working with new libraries that aren't part of the base model. Telling it to pull from the latest developer docs will drastically increase its accuracy.

ryanfreckleton commented 1 year ago

Probably https://github.com/kagisearch/vectordb would be a better fit than Chroma or the other systems.

I've had some success with other experiments running HTML documentation through html2markdown and then summarizing it into a cheat sheet, but the primary issues here will be figuring out a reasonable RAG approach for Aider and extracting relevant docs.

joshuavial commented 11 months ago

https://github.com/fleet-ai/context might give us an easy library to build on top of

joshuavial commented 11 months ago

Have done a bit more of a deep dive into fleet context and am thinking about the following approach

@paul-gauthier how would you feel about adding a langchain dependency to the project?

We could also have it so that the document retrieval is a separate query @langchain how to use fleet-context to retrieve embeddings with the outcome being only some documents added to our context window. Or it could be 'achieve goal xyz @pandas @langchain' where the document retrieval is done in the same step as directing Aider. Keen to hear people's thoughts on which approach they would prefer?

In terms of options I think we could a) give the user a lot of visibility / control over the context retrieved (think parallel to how files are currently tracked). /list-contexts /clear-contexts /rm-context 3 etc. b) give them less control - contexts are more or less invisible in the ui c) maybe b) by default with an option to jump into a) for people who like complexity & control?

Given the overall strategy of wanting people to think less about the context and just have aider figure it out maybe we can think about the context management commands more as internal functions that aider will start to call as it gets more capable and keep the ui focused on what the user wants to achieve?

In terms of integrating with fleet, I notice the author mentioning the intention to add rust and js in the future as well as moving beyond the openai embedding model to some open ones. I think the main call we'd need to make is do we leverage the fleet api for querying contexts or try and query them locally

arabold commented 11 months ago

I'm very new to aider (and relative new to LLMs in general), so I might overlook something obvious here, but I think things could be as simple as allowing aider to fetch a specific website and add it to the context as documentation. Sure, indexing the docs for whole libraries would be awesome, but a first step could be to just point it to the right location manually, i.e. "Implement a Next.js modal using a parallel route as described in https://nextjs.org/docs/app/building-your-application/routing/parallel-routes".

Fetching the page and adding it to the context (MemoryVectorStore) would solve this quite elegantly without adding much complexity, wouldn't it?

joshuavial commented 11 months ago

@arabold yes, I think that could work for a whole bunch of use cases.

I like the approach of just detecting a url in the input string - I think most of the work will be in making it resilient to a whole bunch of different websites, so it might be a game of maintenance whack a mole.

I wonder about whether the additional context is loaded for that single query or if it's added to the conversation. If it's added to the conversation I'm curious about the UI component - do we have a section called 'additional context' with it's own add / remove sort of features?

paul-gauthier commented 6 months ago

Given that aider has the /web command to download URLs and add them to the chat, I'm going to close this issue for now. But feel free to add a comment here and I will re-open or file a new issue any time.

Astlaan commented 2 months ago

Cursor allows one to add documentation from the web to project. Can Aider do this?

With Cursor you can provide it with the entry point URL to some web documentation. It explores the whole documentation website and "Indexes" it's contents (I guess it stores them in a vector database).