continuedev / continue

⏩ Continue is the leading open-source AI code assistant. You can connect any models and any context to build custom autocomplete and chat experiences inside VS Code and JetBrains
https://docs.continue.dev/
Apache License 2.0
18.1k stars 1.43k forks source link

Poor performance in large repos with version 0.8.43-vscode #1774

Closed spew closed 2 months ago

spew commented 2 months ago

Before submitting your bug report

Relevant environment info

- OS: MacOS (ARM) and Linux
- Continue: 0.8.43-vscode
- IDE: vscode

Description

Version 0.8.43-vscode seems to have a lot of performance problems in larger repos. Some behavior I have noticed:

I've started looking at the code, the first thing I noticed is that in the new CodebaseIndexert.ts it looks like we wait to build up a list of all files in the repo. I suspect just changing this to an AsyncIterator (via yield) would fix a lot of the problem. I am willing to do some work to resolve this issue so I am attempting to try this out now.

spew commented 2 months ago

This may be a duplicate of https://github.com/continuedev/continue/issues/1705

Patrick-Erichsen commented 2 months ago

^ Correct, thanks for the confirmation however! Let me know if any of this is relevant to your setup: https://github.com/continuedev/continue/issues/1705#issuecomment-2237339934

spew commented 2 months ago

Left a comment -- I suspect it's pretty easy to reproduce in a large repo. Maybe try opening something like the Kubernetes repo.

Patrick-Erichsen commented 2 months ago

Your hunch feels like it could be accurate. Verified the issue opening up https://github.com/torvalds/linux

Here is the diff between 0.8.42 <> 0.8.43: https://github.com/continuedev/continue/compare/v0.8.42-vscode...v0.8.43-vscode

In 0.8.42 this was the logic for directory traversal: https://github.com/continuedev/continue/blob/b2c4593b63782694b6cc8ac1296f72f2208a8fe7/extensions/vscode/src/util/traverseDirectory.ts#L26-L32

In 0.8.43, here is the traversal logic: https://github.com/continuedev/continue/blob/57f6e774fb84378b6d8ad9ab1e7efc9f8c0d9eb6/core/indexing/walkDir.ts#L334-L338

We did indeed switch from an AsyncGenerator.

spew commented 2 months ago

Yes, as it currently is implemented, I believe the listing process will block the main/only javascript thread until the array of every filename in the repo is generated.

Another thing I noticed is there are multiple context providers that want a view of "everything in the repo". A follow up optimization (not needed I think to fix this), would be to reduce this down to a single listing that each of the context providers can use. Again, if this is the issue, this optimization would not be needed I think to fix the current problem but it would make continue faster (limited to a single CPU) and use less system resources (less I/O).

Patrick-Erichsen commented 2 months ago

Seems like that would do it on a large enough repo - working on an AsyncGenerator refactor to walkDir.

You are correct on the context providers as well, in addition to a few other spots (e.g. https://github.com/continuedev/continue/blob/dev/extensions/vscode/src/quickEdit/QuickEditQuickPick.ts#L100-L104). We've talked about getting this consolidated in global state in the past but this feels like a good opportunity to actually move forward on that.

Appreciate the deep dive here!

spew commented 2 months ago

Thanks, another thing I've noticed when profiling the indexing system is that it spends most of its time calling countTokens -- I created an issue for it here while I have your ear: https://github.com/continuedev/continue/issues/1775

Patrick-Erichsen commented 2 months ago

@spew still working on this but proving to be a bit tricky to patch the iterator onto the existing event pattern. Did you make any progress on your end?

spew commented 2 months ago

Created this follow up PR: https://github.com/continuedev/continue/pull/1806

spew commented 2 months ago

Also related: https://github.com/continuedev/continue/issues/1705

spew commented 2 months ago

Another PR: https://github.com/continuedev/continue/pull/1834

spew commented 2 months ago

Closing this one in lieu of the other open issues.