Aider-AI / aider

aider is AI pair programming in your terminal
https://aider.chat/
Apache License 2.0
22.24k stars 2.07k forks source link

Automatically summarize relevant library code into the repo map #1991

Open voxoid0 opened 1 month ago

voxoid0 commented 1 month ago

Issue

Hello, this is my fav coding tool, thank you!

When I use libraries that isn't as common, then the LLM does not have enough knowledge to correctly use the API, and it ends up hallucinating many incorrect usages. For example, I'm using the midifile-ts library in my Expo/React Native project. At some point it seems essential that aider would "learn"/RAG the API via it's code e.g. in node_modules, or its documentation e.g. README on github.

Currently I've been either pasting github raw file URLs that I want it to scrape, or using /read to add files manually, but iiuc this adds the entire contents to the prompt rather than summarizing? Plus it's more manual work.

Version and model info

Aider 0.59.1, GPT-4

fry69 commented 1 month ago

Thank you for filing this issue.

This document may be helpful -> https://aider.chat/docs/usage/tips.html#providing-docs

See also here -> https://aider.chat/docs/faq.html#can-i-use-aider-with-multiple-git-repos-at-once

voxoid0 commented 1 month ago

Thanks, and I've already read those, but /read and URL scraping doesn't summarize the API like repo map does, correct? So it would quickly bloat the prompt with just one library I believe...

fry69 commented 1 month ago

The repository map summarizes code in ctags style way for some languages -> https://aider.chat/docs/languages.html

You can generate such repository map for a another repository and include that map in your context to give the LLM knowledge about e.g. an external library/framework.

As described here -> aider.chat/docs/faq.html#can-i-use-aider-with-multiple-git-repos-at-once

voxoid0 commented 1 month ago

Downloading the library source and generating a repo map to a file and then reading it, is a good work-around.

Since writing code against 3rd party libraries is a common use case, it'd definitely be a useful enhancement to do this automatically or semi-automatically, maybe even e.g. simply summarizing code in the node_modules folder in the nodejs case. (In fact I meant for this issue to be a feature request, but not sure how to change the "question" tag -- but thank you for the responses.)

youyuanrsq commented 1 month ago

Downloading the library source and generating a repo map to a file and then reading it, is a good work-around.

Since writing code against 3rd party libraries is a common use case, it'd definitely be a useful enhancement to do this automatically or semi-automatically, maybe even e.g. simply summarizing code in the node_modules folder in the nodejs case. (In fact I meant for this issue to be a feature request, but not sure how to change the "question" tag -- but thank you for the responses.)

It is indeed a good idea to add the repo map of the third-party libraries being used into the current context, but whether to do this automatically needs to be carefully considered. As we all know, if the context length becomes too long, it may affect the model’s performance. Therefore, I suggest adding a feature to manually specify which third-party package’s repo map should be included in the context, allowing the user to decide. This would be similar to the /add-to-map command you just mentioned.

voxoid0 commented 1 month ago

Yeah there would have to be some logic around target context window size and prioritizing the information to be included (like obiously needed libraries and code), and possible how compactly it is summarized.

paul-gauthier commented 1 week ago

I'm labeling this issue as stale because it has been open for 2 weeks with no activity. If there are no additional comments, it will be closed in 7 days.

voxoid0 commented 1 week ago

I think this is a feature important enough to be considered for promotion to feature planning, no? This will be a core/typical feature of coding agents in the future, if not of aider.