continuedev / continue

⏩ Continue is the leading open-source AI code assistant. You can connect any models and any context to build custom autocomplete and chat experiences inside VS Code and JetBrains
https://docs.continue.dev/
Apache License 2.0
14.87k stars 1.09k forks source link

VS Code extension slows down entire vscode on startup with remote (walkDir) #1705

Closed Yanpas closed 2 weeks ago

Yanpas commented 1 month ago

Before submitting your bug report

Relevant environment info

- OS: Windows 10 / RHEL7 on remote
- Continue: 0.8.43
- IDE: VS Code 1.85
- Model:
- config.json:

"disableIndexing": "true"
// the rest is default: ollama providers

Description

I use VSCode remote: the extension is running on extension host on windows machine, the code is located on other SSH linux machine. I ran extension host profile and it looks like "walkDir" task that runs on extension startup eats entire SSH channel between windows and linux machine, due to which other parts of vscode become unresponsive. (in profile there's "onReaddir" on top, then "filterEntry" down the stack with "match" below. Minimatch's match method seems to be pretty slow)

I'd like to be able to disable completely full project walkthrough since I don't need it. Alternatively, probably it may be better to run the extension on SSH'ed host.

To reproduce

  1. Connect using SSH to other machine with a project with a large amount of files. SSH connection shouldn't be blazingly fast
  2. Run "Developer: Show Running Extensions" and click start profile
  3. Open a window with continue chat, the extension will start inititalizing
  4. Now go to explorer tab: try navigating through project tree (each directory opens very slow
  5. Or open a terminal - there will be significant lag when you type something
  6. Save the profile. Also "Continue" extension will be marked as unresponisve.

Log output

Unfortunately I can't attach info from my working PC due to company policies.
SaahilClaypool commented 1 month ago

Downgrading to v0.8.42 fixes this for me. Also using SSH extension.

I have indexing disabled in the indexing in the config as I could never get that to work properly

Qualzz commented 1 month ago

I confirm the vs code extension is slowing down everything on vs code when using SSH REMOTE, and even sometimes makes the editor unresponsive. (Even with indexing disabled) Disabling the extension for now.

jonnyboynewton commented 1 month ago

I have been experiencing the same issue here. Massive performance issues with greater than 0.8.42. Windows laptop connected to a linux machine via remote-ssh. I thought it was maybe the high cpu usage on the remote machine for the node process that vscode spawns off for filewatching, but even with the downgrade that process is still high usage in top (90+% cpu usage). But vscode is now usable again. vscode was virtually unusable in some cases - especially when there was multiple workspaces open. One workspace would be fine, the others would come to a halt. Keystrokes in the terminal would take minutes to come back vs less than a second normally. So there is definitely something wrong with Continue starting w/ 0.8.43 as it's now completely fixed with the downgrade. I'm on the latest vscode (1.91.1). I do not have indexing disabled. I'm on Windows 11 23H2

sestinj commented 1 month ago

@Qualzz @SaahilClaypool @Yanpas I have a few questions as we're trying to learn more about what's happening here. If you could share any of this it would be immensely helpful!

spew commented 1 month ago

@Qualzz @SaahilClaypool @Yanpas I have a few questions as we're trying to learn more about what's happening here. If you could share any of this it would be immensely helpful!

  • Do you use git, or a different version control system like Perforce?

git

  • If using git, is your VS Code workspace at the root directory of the git repo, or do you have a subdirectory opened?

root

  • Is the problem at all resolved by removing the "folder" context provider from config.json?

have the 'modern' config with no context providers included

  • If you install the Continue extension on the remote host rather than on your local machine does the same problem occur?

only use on local machine, no remote machine involved

  • Is the remote connection closing, or are you exclusively seeing UI lag?

My first thing to investigate would be changing walkDir to not block on building up an array of every file name in the repo.

Yanpas commented 1 month ago
  • Do you use git, or a different version control system like Perforce?

git

  • If using git, is your VS Code workspace at the root directory of the git repo, or do you have a subdirectory opened?

yes, it's in the root. Also one of my projects has submodule

  • Is the problem at all resolved by removing the "folder" context provider from config.json?

No

  • If you install the Continue extension on the remote host rather than on your local machine does the same problem occur?

It didn't work for me,

  • Is the remote connection closing, or are you exclusively seeing UI lag?

The remote connection isn't closing, it becomes bloated by some enormous amount of data that extension is using. Thus remote terminal stops responding (JFYI, terminal prints typed symbol only after receiving it). I don't observer UI freezes, but that's almost impossible since extensionhost is another process.

I'm almost sure that:

  1. the extension eats way too much CPU from the extension host process.
  2. apart from that it exchanges a lot of data with SSH (most likely VSCode streams filesystem events/data).
Yanpas commented 1 month ago

Actually seems it's impossible to disable Files provider:

"contextProviders": [
 {"name":"diff"}
]

and I still see Files in "at" dropdown

Patrick-Erichsen commented 1 month ago

@Qualzz @SaahilClaypool @Yanpas @spew

Making this the main thread to track issues with slow downs.

1783 hopefully solves the problem with walkDir blocking the main thread.

However, we'll keep this issue open until you all can confirm that the extension isn't slowing down your VS Code.

spew commented 1 month ago

This seems like an improvement, when I'm testing locally. However, I think there is another easy win.

The walkDir is building up an array of every file name in the repo and we are still waiting on the entire repo list to be built up before beginning the chunking / analyzing process, I think we can reduce this memory requirement and do more in parallel -- I'll send out a PR for this in a minute.

spew commented 1 month ago

Taking a look at it some more, it isn't so easy because CodebaseIndexer expects to operate on the list of every file in the repo and not go file by file. Taking a look to see if this can be easily split up.

I would say, the current state of things is that VSCode is much more responsive but for large repos indexing is either non-functional or barely functional.

It seems like the reason why we want to get the full list of files at the beginning of indexing is to display an accurate progress bar, ironically, I think calculating the progress and keeping that working is what is making it hard to convert to more of an asynciterator / yield based streaming approach where the indexing process works file by file instead of trying to build up a giant list of every file and then doing relying on codebaseIndex.update(...) to do the batching of work.

Patrick-Erichsen commented 1 month ago

Thanks for the review! I had a similar train of thought when refactoring, #1783 was an okay quick win but the real issue is that we still don't actually begin any of the work until the file list has been built up.

Chatting with @sestinj today on this 👍

spew commented 1 month ago

Yeah -- I'm also looking at walkDir(...) and I suspect it can be made faster, I'm playing around with it now.

One thing I noticed, it seems like ignore files only apply to their current level in the hierarchy? Is this true? Have you had complaints about continue not respecting .gitignore? I would expect, from reading the code, that a .gitignore will only be respected for the current folder it is in and not all folders below it.

UPDATE: Never mind, I see that the current level's ignore files are passed downward in the walkerOpt(...) function

sestinj commented 1 month ago

Progress bar is in fact the main reason for this. Even if we were to stream in parallel to chunking/embeddings though, we'd have roughly the same problem in the case where none of the files need to be indexed, it would basically just be completing the walkDir method in the same amount of time, and all of the .gitignore matching still needs to happen.

This makes me tend to think that worker threads are the way to go

Added a test here just so we can be more confident about the .gitignore behavior: https://github.com/continuedev/continue/commit/fa8eaa200b376ea5a46238b5d7ea8b1e773d9f08

spew commented 1 month ago

One issue I noticed, that is definitely an issue, is in this new bit of code in `walkDir(...):

    for await (const walkedEntries of walker.start()) {
      entries = [...walkedEntries];
    }

Note that the line of for loop is actually setting entries EQUAL to the current value of the iterator. Thus, entries is always equal to the last member of the AsyncIterator, this generally works because the values returned by the AsyncIterators are Sets and I believe it is a set of every file seen up to this point (probably also an issue).

I was able to get the tests to still pass (and a new test I added that iterated over a large directory and verified the number of files returned by changing the above code to this:

    let lastValue = entries
    for await (const walkedEntries of walker.start()) {
      lastValue = walkedEntries;
    }
    entries = [...lastValue];

So no comments about correctness, but basically we are only returning the entries of the last Set (I think this may always end up being the right answer because the constructor for Walker uses the same Set if the parent option is passed in).

spew commented 1 month ago

I'm testing out some changes to see if I can get walkDir to iterate faster over a large directory tree / source tree, I'll post an updated PR if I can improve the performance (and potentially resolve the issue I noted)

Yanpas commented 1 month ago

Will it be possible to disable indexing the whole project? Most of the times I'll need to attach only currently opened file as a context

fry69 commented 1 month ago

Will it be possible to disable indexing the whole project? Most of the times I'll need to attach only currently opened file as a context

You already can disable indexing for the whole workspace folder/project with a .continuerc.json file in the root workspace folder:

{
    "disableIndexing": true
}

You can also globally disable indexing by putting above property setting into the .continue/config.json file (Mac/Linux), but you then can NOT enable indexing again in a workspace folder via .continuerc.json, sadly.

It is also possible to configure which files should be excluded via .continueignore in the root workspace folder (gitignore syntax), see here -> https://docs.continue.dev/walkthroughs/codebase-embeddings#ignore-files-during-indexing

Yanpas commented 1 month ago

Unfortuanely ,even with disableIndexing true, this issue still exists (I disabled indexing before raising this issue)

Yanpas commented 1 month ago

By the way, why the extension kind was chosen to be "UI" and not "Workspace"? (https://code.visualstudio.com/api/advanced-topics/remote-extensions) It seems to me that Workspace fits better the need to access files of the project. And VSCode also has the ability of port forwarding, thus access to LLM can be configured to work both ways (e.g. if SSH doesn't have internet / or ollama is hosted natively on Windows)

spew commented 1 month ago

I am working on simplifying and improving the performance of walkDir(...) and will add worker threads for doing the regex matching. I expect to get this done today and to have a very positive impact.

jonnyboynewton commented 1 month ago

By the way, why the extension kind was chosen to be "UI" and not "Workspace"? (https://code.visualstudio.com/api/advanced-topics/remote-extensions) It seems to me that Workspace fits better the need to access files of the project. And VSCode also has the ability of port forwarding, thus access to LLM can be configured to work both ways (e.g. if SSH doesn't have internet / or ollama is hosted natively on Windows)

If this was changed, it would need to have rock-solid connection to LLMs on the local machine. For instance we use local (on the laptop) Ollama models for certain types of highly secure codebases and we could not, for instance, have Ollama sitting on a port bound to the IP address of the machine (vs just localhost) - so the port forwarding between the remote host and the local machine would really need to be top-notch.

spew commented 1 month ago

Created this PR which improves walkDir performance: https://github.com/continuedev/continue/pull/1806

spew commented 1 month ago

It'd be great if @Yanpas could try it

sestinj commented 1 month ago

@jonnyboynewton this is the main reason we plan to stick with "UI". I've tested this a few times and it's definitely not easy enough to set up the connection reliably

jonnyboynewton commented 1 month ago

@spew I tested the latest pre-release with your walkDir fix and while it is still a bit slower interactively than 0.8.42, it's actually usable again on even extremely large workspaces (over 1M files)

spew commented 1 month ago

Thanks @jonnyboynewton, I have two other potential changes in flight as well.

Yanpas commented 1 month ago

It'd be great if @Yanpas could try it

I've opened a problematic folder twice, didn't notice any issue (yet I had indexing disabled). Thanks, seems to work out! Also "Developer: Show Running Extensions" doesn't show any perf issue

thomelane commented 3 weeks ago

@spew this seems to have also resolved an issue I was having with a ~5s slow Cmd+I popup (using pre-release right now). Around 500,000 files in my directory (monorepo) for reference.

spew commented 2 weeks ago

I think this issue can be closed.