Closed spew closed 2 months ago
This may be a duplicate of https://github.com/continuedev/continue/issues/1705
^ Correct, thanks for the confirmation however! Let me know if any of this is relevant to your setup: https://github.com/continuedev/continue/issues/1705#issuecomment-2237339934
Left a comment -- I suspect it's pretty easy to reproduce in a large repo. Maybe try opening something like the Kubernetes repo.
Your hunch feels like it could be accurate. Verified the issue opening up https://github.com/torvalds/linux
Here is the diff between 0.8.42 <> 0.8.43: https://github.com/continuedev/continue/compare/v0.8.42-vscode...v0.8.43-vscode
In 0.8.42 this was the logic for directory traversal: https://github.com/continuedev/continue/blob/b2c4593b63782694b6cc8ac1296f72f2208a8fe7/extensions/vscode/src/util/traverseDirectory.ts#L26-L32
In 0.8.43, here is the traversal logic: https://github.com/continuedev/continue/blob/57f6e774fb84378b6d8ad9ab1e7efc9f8c0d9eb6/core/indexing/walkDir.ts#L334-L338
We did indeed switch from an AsyncGenerator
.
Yes, as it currently is implemented, I believe the listing process will block the main/only javascript thread until the array of every filename in the repo is generated.
Another thing I noticed is there are multiple context providers that want a view of "everything in the repo". A follow up optimization (not needed I think to fix this), would be to reduce this down to a single listing that each of the context providers can use. Again, if this is the issue, this optimization would not be needed I think to fix the current problem but it would make continue faster (limited to a single CPU) and use less system resources (less I/O).
Seems like that would do it on a large enough repo - working on an AsyncGenerator
refactor to walkDir
.
You are correct on the context providers as well, in addition to a few other spots (e.g. https://github.com/continuedev/continue/blob/dev/extensions/vscode/src/quickEdit/QuickEditQuickPick.ts#L100-L104). We've talked about getting this consolidated in global state in the past but this feels like a good opportunity to actually move forward on that.
Appreciate the deep dive here!
Thanks, another thing I've noticed when profiling the indexing system is that it spends most of its time calling countTokens -- I created an issue for it here while I have your ear: https://github.com/continuedev/continue/issues/1775
@spew still working on this but proving to be a bit tricky to patch the iterator onto the existing event pattern. Did you make any progress on your end?
Created this follow up PR: https://github.com/continuedev/continue/pull/1806
Also related: https://github.com/continuedev/continue/issues/1705
Closing this one in lieu of the other open issues.
Before submitting your bug report
Relevant environment info
Description
Version 0.8.43-vscode seems to have a lot of performance problems in larger repos. Some behavior I have noticed:
I've started looking at the code, the first thing I noticed is that in the new
CodebaseIndexert.ts
it looks like we wait to build up a list of all files in the repo. I suspect just changing this to an AsyncIterator (via yield) would fix a lot of the problem. I am willing to do some work to resolve this issue so I am attempting to try this out now.