continuedev / continue

⏩ Continue is the leading open-source AI code assistant. You can connect any models and any context to build custom autocomplete and chat experiences inside VS Code and JetBrains
https://docs.continue.dev/
Apache License 2.0
19.44k stars 1.69k forks source link

Indexing a Unity project results in thousands of "too many open file" errors. #1535

Open JohnSmithToYou opened 5 months ago

JohnSmithToYou commented 5 months ago

Before submitting your bug report

Relevant environment info

- OS: Windows 10 w/wsl2 running Ollama
- Computer: 64GB ram w/two 4090s
- Continue: v0.8.40
- IDE: VSCode 1.90.2

Description

Once Continue starts to index my Unity project tens of thousands of error messages occur:

console.ts:137 [Extension Host] Error reading file Unknown (FileSystemError) (FileSystemError): Error: EMFILE: too many open files, open 'c:\src\UnityTestBed\Library\PackageCache\com.unity.textmeshpro@3.0.6\Documentation~\TextMeshPro.md.meta'
    at P.e (c:\Users\someone\AppData\Local\Programs\Microsoft VS Code\resources\app\out\vs\workbench\api\node\extensionHostProcess.js:152:6515)
    at Object.readFile (c:\Users\someone\AppData\Local\Programs\Microsoft VS Code\resources\app\out\vs\workbench\api\node\extensionHostProcess.js:152:4465)
    at async _VsCodeIdeUtils.readFile (c:\Users\someone\.vscode\extensions\continue.continue-0.8.40-win32-x64\out\extension.js:366778:25)
    at async VsCodeIde.readFile (c:\Users\someone\.vscode\extensions\continue.continue-0.8.40-win32-x64\out\extension.js:367318:16)
    at async c:\Users\someone\.vscode\extensions\continue.continue-0.8.40-win32-x64\out\extension.js:78836:30
    at async Promise.all (index 27677)
    at async getAddRemoveForTag (c:\Users\someone\.vscode\extensions\continue.continue-0.8.40-win32-x64\out\extension.js:78834:8)
    at async getComputeDeleteAddRemove (c:\Users\someone\.vscode\extensions\continue.continue-0.8.40-win32-x64\out\extension.js:78938:41)
    at async CodebaseIndexer.refresh (c:\Users\someone\.vscode\extensions\continue.continue-0.8.40-win32-x64\out\extension.js:358185:45)
    at async Core.refreshCodebaseIndex (c:\Users\someone\.vscode\extensions\continue.continue-0.8.40-win32-x64\out\extension.js:358992:26)

After about 20K-30K errors I start getting this instead:

console.ts:137 [Extension Host] Failed to load parser for file c:\src\UnityTestBed\Assets\VRTemplateAssets\Scripts\XRKnob.cs: 
log.ts:439   ERR [Extension Host] Unable to load language for file c:\src\UnityTestBed\Assets\VRTemplateAssets\Scripts\XRPokeFollowAffordanceFill.cs RuntimeError: table index is out of bounds
    at wasm://wasm/000b54aa:wasm-function[237]:0x29e6a
    at _Parser.initialize (c:\Users\someone\.vscode\extensions\continue.continue-0.8.40-win32-x64\out\extension.js:39356:19)
    at new _Parser (c:\Users\someone\.vscode\extensions\continue.continue-0.8.40-win32-x64\out\extension.js:38279:16)
    at getParserForFile (c:\Users\someone\.vscode\extensions\continue.continue-0.8.40-win32-x64\out\extension.js:40132:20)
    at async codeChunker (c:\Users\someone\.vscode\extensions\continue.continue-0.8.40-win32-x64\out\extension.js:79467:18)
    at async chunkDocumentWithoutId (c:\Users\someone\.vscode\extensions\continue.continue-0.8.40-win32-x64\out\extension.js:79510:24)
    at async chunkDocument (c:\Users\someone\.vscode\extensions\continue.continue-0.8.40-win32-x64\out\extension.js:79521:20)
    at async _ChunkCodebaseIndex.update (c:\Users\someone\.vscode\extensions\continue.continue-0.8.40-win32-x64\out\extension.js:114028:28)
    at async CodebaseIndexer.refresh (c:\Users\someone\.vscode\extensions\continue.continue-0.8.40-win32-x64\out\extension.js:358192:47)
    at async Core.refreshCodebaseIndex (c:\Users\someone\.vscode\extensions\continue.continue-0.8.40-win32-x64\out\extension.js:358992:26)
console.ts:137 [Extension Host] Unable to load language for file c:\src\UnityTestBed\Assets\VRTemplateAssets\Scripts\XRPokeFollowAffordanceFill.cs RuntimeError: table index is out of bounds
    at wasm://wasm/000b54aa:wasm-function[237]:0x29e6a
    at _Parser.initialize (c:\Users\someone\.vscode\extensions\continue.continue-0.8.40-win32-x64\out\extension.js:39356:19)
    at new _Parser (c:\Users\someone\.vscode\extensions\continue.continue-0.8.40-win32-x64\out\extension.js:38279:16)
    at getParserForFile (c:\Users\someone\.vscode\extensions\continue.continue-0.8.40-win32-x64\out\extension.js:40132:20)
    at async codeChunker (c:\Users\someone\.vscode\extensions\continue.continue-0.8.40-win32-x64\out\extension.js:79467:18)
    at async chunkDocumentWithoutId (c:\Users\someone\.vscode\extensions\continue.continue-0.8.40-win32-x64\out\extension.js:79510:24)
    at async chunkDocument (c:\Users\someone\.vscode\extensions\continue.continue-0.8.40-win32-x64\out\extension.js:79521:20)
    at async _ChunkCodebaseIndex.update (c:\Users\someone\.vscode\extensions\continue.continue-0.8.40-win32-x64\out\extension.js:114028:28)
    at async CodebaseIndexer.refresh (c:\Users\someone\.vscode\extensions\continue.continue-0.8.40-win32-x64\out\extension.js:358192:47)
    at async Core.refreshCodebaseIndex (c:\Users\someone\.vscode\extensions\continue.continue-0.8.40-win32-x64\out\extension.js:358992:26)

Shortly indexing grinds to a halt and it stops at 25%. My SSD is at 80% load. Continue doesn't seem to respond to anything. I've tried a different embeddingsProvider and rebuilt the index five times but that didn't help.

Note: I created a .continueignore file hoping to reduce the number of files it loads (I tested it with git check-ignore), but Continue seems to ignore it! Bare in mind, a tiny Unity project like mine is typically no less than 50K files!

sample.continueignore.txt

My config:

{
  "models": [
    {
      "title": "Codestral 22b",
      "provider": "ollama",
      "model": "codestral-22b-4k:latest"
    },
    {
      "title": "Clive/Codestral 22b",
      "provider": "ollama",
      "model": "codestral-22b-4k-clive:latest"
    }
  ],
  "customCommands": [
    {
      "name": "test",
      "prompt": "{{{ input }}}\n\nWrite a comprehensive set of unit tests for the selected code. It should setup, run tests that check for correctness including important edge cases, and teardown. Ensure that the tests are complete and sophisticated. Give the tests just as chat output, don't edit any file.",
      "description": "Write unit tests for highlighted code"
    }
  ],
  "tabAutocompleteModel": {
    "title": "Codestral 22b",
    "provider": "ollama",
    "model": "codestral-22b-4k:latest"
  },
  "allowAnonymousTelemetry": false,

  "embeddingsProvider": {
    "provider": "transformers.js"
  }
}

To reproduce

1) Download a random Unity project from github. Like this one: https://github.com/SikPang/Unity_VampireSurvivors_Copy 2) Load it into VS Code + Continue 3) Observe

Note: Normally you need to install Unity in order to generate all of the files extra files. Developers are suppose to strip out the generated folders using .gitignore, but the project I linked above forgot to do this. It is a good snapshot of a typical Unity project under development. It will take a while to download. This means you don't need to install Unity to reproduce the problem. It just won't compile.

Log output

See above.
JohnSmithToYou commented 5 months ago

I tested my (.continueignore using ripgrep/ignore (the tool you are using, I think) and it processes my file correctly so I think it's on your end.

One thing to keep in mind, traversing folders without using ripgrep/ignore is tricky because ripgrep/ignore (and git) have special logic because of negation patterns. Also, I see you're ignoring many types of extensions. I suggest don't bother. There is no way to ever get that right! Who knows what kind of project uses your tool. Why not provide a default .ignore file that people can either use or customize? It will clean up your code too.

breynolds3 commented 2 months ago

I changed my batch size from 100 to 5 and got a lot less exceptions related to database contention and open files. I would be good if this were configurable.

sestinj commented 2 months ago

@breynolds3 was this filesPerBatch that you changed?

breynolds3 commented 2 months ago

@sestinj I changed getBatchSize() in core/indexing/CodebaseIndexer.ts

sestinj commented 2 months ago

@JohnSmithToYou I finally got around to cloning this repo you shared. As you said, right upon downloading it started to freeze things up, and there were a ton of debug messages about having failed to chunk the .bin files. Because users don't always have the chance or even know about .continueignore right when they download, we do want to at least exclude very common binary formats, and for some reason we didn't have .bin in our list! So added that.

And then I tried adding a .continueignore at the root with *.bin on the first line, I removed ~/.continue/index, reloaded the window, and then indexing completed without any of the errors about .bin files.

Next I tried with your .continueignore sample and oddly I didn't get any errors regarding the .bin files in this case either (and indexing generally went much more quickly, so I believe that many files were ignored). Were you still getting the .bin debug warnings when using this .continueignore?