Closed Yanpas closed 3 months ago
Downgrading to v0.8.42
fixes this for me. Also using SSH extension.
I have indexing disabled in the indexing in the config as I could never get that to work properly
I confirm the vs code extension is slowing down everything on vs code when using SSH REMOTE, and even sometimes makes the editor unresponsive. (Even with indexing disabled) Disabling the extension for now.
I have been experiencing the same issue here. Massive performance issues with greater than 0.8.42. Windows laptop connected to a linux machine via remote-ssh. I thought it was maybe the high cpu usage on the remote machine for the node process that vscode spawns off for filewatching, but even with the downgrade that process is still high usage in top (90+% cpu usage). But vscode is now usable again. vscode was virtually unusable in some cases - especially when there was multiple workspaces open. One workspace would be fine, the others would come to a halt. Keystrokes in the terminal would take minutes to come back vs less than a second normally. So there is definitely something wrong with Continue starting w/ 0.8.43 as it's now completely fixed with the downgrade. I'm on the latest vscode (1.91.1). I do not have indexing disabled. I'm on Windows 11 23H2
@Qualzz @SaahilClaypool @Yanpas I have a few questions as we're trying to learn more about what's happening here. If you could share any of this it would be immensely helpful!
@Qualzz @SaahilClaypool @Yanpas I have a few questions as we're trying to learn more about what's happening here. If you could share any of this it would be immensely helpful!
- Do you use git, or a different version control system like Perforce?
git
- If using git, is your VS Code workspace at the root directory of the git repo, or do you have a subdirectory opened?
root
- Is the problem at all resolved by removing the "folder" context provider from config.json?
have the 'modern' config with no context providers included
- If you install the Continue extension on the remote host rather than on your local machine does the same problem occur?
only use on local machine, no remote machine involved
- Is the remote connection closing, or are you exclusively seeing UI lag?
My first thing to investigate would be changing walkDir
to not block on building up an array of every file name in the repo.
- Do you use git, or a different version control system like Perforce?
git
- If using git, is your VS Code workspace at the root directory of the git repo, or do you have a subdirectory opened?
yes, it's in the root. Also one of my projects has submodule
- Is the problem at all resolved by removing the "folder" context provider from config.json?
No
- If you install the Continue extension on the remote host rather than on your local machine does the same problem occur?
It didn't work for me,
- Is the remote connection closing, or are you exclusively seeing UI lag?
The remote connection isn't closing, it becomes bloated by some enormous amount of data that extension is using. Thus remote terminal stops responding (JFYI, terminal prints typed symbol only after receiving it). I don't observer UI freezes, but that's almost impossible since extensionhost is another process.
I'm almost sure that:
Actually seems it's impossible to disable Files provider:
"contextProviders": [
{"name":"diff"}
]
and I still see Files in "at" dropdown
@Qualzz @SaahilClaypool @Yanpas @spew
Making this the main thread to track issues with slow downs.
walkDir
blocking the main thread.However, we'll keep this issue open until you all can confirm that the extension isn't slowing down your VS Code.
This seems like an improvement, when I'm testing locally. However, I think there is another easy win.
The walkDir is building up an array of every file name in the repo and we are still waiting on the entire repo list to be built up before beginning the chunking / analyzing process, I think we can reduce this memory requirement and do more in parallel -- I'll send out a PR for this in a minute.
Taking a look at it some more, it isn't so easy because CodebaseIndexer
expects to operate on the list of every file in the repo and not go file by file. Taking a look to see if this can be easily split up.
I would say, the current state of things is that VSCode is much more responsive but for large repos indexing is either non-functional or barely functional.
It seems like the reason why we want to get the full list of files at the beginning of indexing is to display an accurate progress bar, ironically, I think calculating the progress and keeping that working is what is making it hard to convert to more of an asynciterator / yield based streaming approach where the indexing process works file by file instead of trying to build up a giant list of every file and then doing relying on codebaseIndex.update(...)
to do the batching of work.
Thanks for the review! I had a similar train of thought when refactoring, #1783 was an okay quick win but the real issue is that we still don't actually begin any of the work until the file list has been built up.
Chatting with @sestinj today on this 👍
Yeah -- I'm also looking at walkDir(...) and I suspect it can be made faster, I'm playing around with it now.
One thing I noticed, it seems like ignore files only apply to their current level in the hierarchy? Is this true? Have you had complaints about continue not respecting .gitignore? I would expect, from reading the code, that a .gitignore will only be respected for the current folder it is in and not all folders below it.
UPDATE: Never mind, I see that the current level's ignore files are passed downward in the walkerOpt(...)
function
Progress bar is in fact the main reason for this. Even if we were to stream in parallel to chunking/embeddings though, we'd have roughly the same problem in the case where none of the files need to be indexed, it would basically just be completing the walkDir method in the same amount of time, and all of the .gitignore matching still needs to happen.
This makes me tend to think that worker threads are the way to go
Added a test here just so we can be more confident about the .gitignore behavior: https://github.com/continuedev/continue/commit/fa8eaa200b376ea5a46238b5d7ea8b1e773d9f08
One issue I noticed, that is definitely an issue, is in this new bit of code in `walkDir(...):
for await (const walkedEntries of walker.start()) {
entries = [...walkedEntries];
}
Note that the line of for loop is actually setting entries
EQUAL to the current value of the iterator. Thus, entries
is always equal to the last member of the AsyncIterator, this generally works because the values returned by the AsyncIterators are Sets and I believe it is a set of every file seen up to this point (probably also an issue).
I was able to get the tests to still pass (and a new test I added that iterated over a large directory and verified the number of files returned by changing the above code to this:
let lastValue = entries
for await (const walkedEntries of walker.start()) {
lastValue = walkedEntries;
}
entries = [...lastValue];
So no comments about correctness, but basically we are only returning the entries of the last Set (I think this may always end up being the right answer because the constructor for Walker uses the same Set
if the parent
option is passed in).
I'm testing out some changes to see if I can get walkDir to iterate faster over a large directory tree / source tree, I'll post an updated PR if I can improve the performance (and potentially resolve the issue I noted)
Will it be possible to disable indexing the whole project? Most of the times I'll need to attach only currently opened file as a context
Will it be possible to disable indexing the whole project? Most of the times I'll need to attach only currently opened file as a context
You already can disable indexing for the whole workspace folder/project with a .continuerc.json
file in the root workspace folder:
{
"disableIndexing": true
}
You can also globally disable indexing by putting above property setting into the .continue/config.json
file (Mac/Linux), but you then can NOT enable indexing again in a workspace folder via .continuerc.json
, sadly.
It is also possible to configure which files should be excluded via .continueignore
in the root workspace folder (gitignore syntax), see here -> https://docs.continue.dev/walkthroughs/codebase-embeddings#ignore-files-during-indexing
Unfortuanely ,even with disableIndexing true, this issue still exists (I disabled indexing before raising this issue)
By the way, why the extension kind was chosen to be "UI" and not "Workspace"? (https://code.visualstudio.com/api/advanced-topics/remote-extensions) It seems to me that Workspace fits better the need to access files of the project. And VSCode also has the ability of port forwarding, thus access to LLM can be configured to work both ways (e.g. if SSH doesn't have internet / or ollama is hosted natively on Windows)
I am working on simplifying and improving the performance of walkDir(...)
and will add worker threads for doing the regex matching. I expect to get this done today and to have a very positive impact.
By the way, why the extension kind was chosen to be "UI" and not "Workspace"? (https://code.visualstudio.com/api/advanced-topics/remote-extensions) It seems to me that Workspace fits better the need to access files of the project. And VSCode also has the ability of port forwarding, thus access to LLM can be configured to work both ways (e.g. if SSH doesn't have internet / or ollama is hosted natively on Windows)
If this was changed, it would need to have rock-solid connection to LLMs on the local machine. For instance we use local (on the laptop) Ollama models for certain types of highly secure codebases and we could not, for instance, have Ollama sitting on a port bound to the IP address of the machine (vs just localhost) - so the port forwarding between the remote host and the local machine would really need to be top-notch.
Created this PR which improves walkDir performance: https://github.com/continuedev/continue/pull/1806
It'd be great if @Yanpas could try it
@jonnyboynewton this is the main reason we plan to stick with "UI". I've tested this a few times and it's definitely not easy enough to set up the connection reliably
@spew I tested the latest pre-release with your walkDir fix and while it is still a bit slower interactively than 0.8.42, it's actually usable again on even extremely large workspaces (over 1M files)
Thanks @jonnyboynewton, I have two other potential changes in flight as well.
It'd be great if @Yanpas could try it
I've opened a problematic folder twice, didn't notice any issue (yet I had indexing disabled). Thanks, seems to work out! Also "Developer: Show Running Extensions" doesn't show any perf issue
@spew this seems to have also resolved an issue I was having with a ~5s slow Cmd+I popup (using pre-release right now). Around 500,000 files in my directory (monorepo) for reference.
I think this issue can be closed.
Before submitting your bug report
Relevant environment info
Description
I use VSCode remote: the extension is running on extension host on windows machine, the code is located on other SSH linux machine. I ran extension host profile and it looks like "walkDir" task that runs on extension startup eats entire SSH channel between windows and linux machine, due to which other parts of vscode become unresponsive. (in profile there's "onReaddir" on top, then "filterEntry" down the stack with "match" below. Minimatch's match method seems to be pretty slow)
I'd like to be able to disable completely full project walkthrough since I don't need it. Alternatively, probably it may be better to run the extension on SSH'ed host.
To reproduce
Log output