fynnfluegge / codeqai

Local first semantic code search and chat powered by vector embeddings and LLMs
Apache License 2.0
385 stars 46 forks source link

Various issues and fixes on Windows #22

Closed strawberrymelonpanda closed 7 months ago

strawberrymelonpanda commented 8 months ago

Hi, I tried to run your project using Windows 11 Powershell 7.4 and ran into various issues. I was able to debug some of them, so I thought I'd jot down the steps I took:

1) pipx run --spec codeqai codeqai configure

This didn't work for me, the setup launched but Codeqai was unavailable after. (My understanding of pipx run is that it's a temporary, run-once sandbox venv only.)

pipx install codeqai, followed by codeqai configure worked instead.

2) UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 1137: character maps to

Whenever you're using open(), I believe you should add encoding='utf-8'. This solves this issue. For example: with open(env_path, "w", encoding='utf-8') as env_f: in app.py.

3) Command '['C:\\Users\\<USER>\\.local\\pipx\\venvs\\codeqai\\Scripts\\python.exe', '-m', 'pip', 'install', 'faiss-gpu (Only if your system supports CUDA))']' returned non-zero exit status 1.

Unless I'm mistaken, on line 170 of vector_store.py, you're passing the literal string faiss-gpu (Only if your system supports CUDA) to pip install. You'd want fiass-gpu instead. However, I still couldn't install fiass-gpu as it returned a no compatible packages error. fiass-cpu worked fine.

4) When I reran codeqai search/sync/etc I get "IndexError: list index out of range". in C:\Users\<USER>\.local\pipx\venvs\codeqai\lib\site-packages\codeqai\vector_store.py", line 34,.

This seems to be because "documents" is empty. Going back to app.py, the files var after files = repo.load_files() has an array of docs, but documents does not after documents = codeparser.parse_code_files(files)

After some debugging, this seems to be because treesitterNodes in codeparser.py by line 36 is empty. However, programming_language has content (Language.JAVASCRIPT /n Language.JAVASCRIPT), TreesitterMethodNode has <codeqai.treesitter.treesitter_js.TreesitterJavascript object at .... (x2), and file_bytes also has the expected file data.

I'm unfamiliar with Treesitter to be able to debug any further as to why treesitter_parser.parse(file_bytes) is returning an empty array in this case.

Hope this can help.

P.S. Didn't include this in the list as it may be my local ENV, but for some reason I was unable to run codeqai via pipx in python 3.10.5. It repeatedly wanted to use pyenv-win 3.9.6, even though that was nowhere on my system. I had to install 3.9.6 to be able to continue. This may be a local env issue from an old installation however.

fynnfluegge commented 8 months ago

Hi @strawberrymelonpanda, thanks a lot for these valuable findings! Let me go through them. Unfortunately I am on Mac and does not own a windows machine. But lets try to solve this!

  1. pipx install codeqai is also my preferred way of choice but on a device a saw some issues which got resolved with pipx run --spec codeqai codeqai configure. Also in some other open source projects I saw pipx run as the preferred approach. Maybe pipx install is the way to go. Unfortunately I never had an issue with both on my own device and cannot reproduce it. I use also pyenv btw. But I am on Mac.
  2. Good catch! This is also not popping up on Mac. If you like you can raise a PR to fix this :)
  3. Another good catch! I have no Cuda (since, what else? Mac haha) and never tried that installation, seems it is broken. If you like you can also fix this in a PR if you like :)
  4. This is interesting, I will investigate this further. To me this always worked, but I will try new scenarios!

Thanks a lot for your extensive feedback!

strawberrymelonpanda commented 8 months ago

No worries, I expected you were running Mac or Linux and some of these issues are certainly Windows specific. I expect that some of them would also be resolved by using WSL2, but I just thought I'd give it a try as-is to see how far I could get.

For the UTF-8 issue, that one is definitely a Windows thing (unfortunately!) as far as I know, but thankfully it should also be harmless to Linux and Mac to specify an encoding on open.

As for submitting PRs, I'll have to pass for now since I haven't actually been able to use Codeqai yet, aside from getting past the crash points. If I can figure out what's going on with the Treesitter parse call and get everything in a working state, I'll see about circling back if they're still issues.

For now I just figured it was best to have the information out there in case someone picks it up. πŸ˜„

FYI: If you're interested in taking a look at some point, Microsoft offers a free, ready to go Windows ISO for VMs over at https://developer.microsoft.com/en-us/windows/downloads/virtual-machines/ for development. I understand if you don't want to get into that overhead of testing multiple systems though.

The project looks great by the way, nice to see Llama.cpp and Instructor embeddings being used locally for this. The sync update in particular is a great idea. Good luck!

fynnfluegge commented 8 months ago

Alright, but in any case thanks for our insights! ❀️ I have an idea regarding point 4. Currently my treesitter parser only parses methods with documentation. simple Dataobjects or class headers are skipped. Maybe you tried to sync git diff and got some updates but the treesitter parser did not found any, since your git diff only consist of changes outside of methods. This is the first idea I have in mind, but I will take a closer look. Guess it is an easy fix!

strawberrymelonpanda commented 8 months ago

I have an idea regarding point 4. Currently my treesitter parser only parses methods with documentation.

I think this may be the issue as you suspect. I saw the note in the Readme that it works better with documentation, but I thought I'd throw it at some code I had just to try it out. So if this is expected behavior that's fine, all that needs fixed is the "IndexError: list index out of range" in that case.

For a sanity check and a common point of reference, I ran Codeqai search against the current main branch of Codeqai and it worked. πŸ‘

However, after adding the "encoding" changes in a new branch and committing them, I ran codeqai sync and got a new error, which may still be related to what you were saying:

File "C:\Users\<USER>\.local\bin\codeqai.exe\__main__.py", line 7, in <module>
[...]
if document.metadata["filename"] in self.vector_cache:
TypeError: argument of type 'VectorCache' is not iterable

By the way, unless I comment out the spinners, when it errors out it loops on β ‡ πŸ’Ύ Syncing vector store... after an error rather than quitting. (and in Powershell, won't respond to ctrl+c, so the term has to be closed)

Not a big deal, but thought I'd mention.

fynnfluegge commented 8 months ago

Thanks for reporting! I fixed the

File "C:\Users\<USER>\.local\bin\codeqai.exe\__main__.py", line 7, in <module>
[...]
if document.metadata["filename"] in self.vector_cache:
TypeError: argument of type 'VectorCache' is not iterable

error with https://github.com/fynnfluegge/codeqai/releases/tag/0.0.7 βœ…

Will also take a closer look to point 4 from your first comment πŸ™Œ I added this

vector_store.sync_documents([])

to the test but it passes for me πŸ€”

tyoc213 commented 8 months ago

on osx 14.2.1 from the cloned repo I got this:

pipx run --spec codeqai codeqai configure
Fatal error from pip prevented installation. Full pip output in file:
    /Users/tyoc213/Library/Logs/pipx/cmd_2024-01-15_18.12.19_pip_errors.log

pip failed to build package:
    multidict

Some possibly relevant errors from pip install:
    error: subprocess-exited-with-error
    multidict/_multidict.c:458:37: error: incompatible pointer to integer conversion initializing 'int' with an expression of type 'void *' [-Wint-conversion]
    multidict/_multidict.c:503:37: error: incompatible pointer to integer conversion initializing 'int' with an expression of type 'void *' [-Wint-conversion]
    multidict/_multidict.c:538:37: error: incompatible pointer to integer conversion initializing 'int' with an expression of type 'void *' [-Wint-conversion]
    multidict/_multidict.c:780:37: error: incompatible pointer to integer conversion initializing 'int' with an expression of type 'void *' [-Wint-conversion]
    multidict/_multidict.c:839:37: error: incompatible pointer to integer conversion initializing 'int' with an expression of type 'void *' [-Wint-conversion]
    multidict/_multidict.c:875:37: error: incompatible pointer to integer conversion initializing 'int' with an expression of type 'void *' [-Wint-conversion]
    multidict/_multidict.c:922:37: error: incompatible pointer to integer conversion initializing 'int' with an expression of type 'void *' [-Wint-conversion]
    multidict/_multidict.c:970:37: error: incompatible pointer to integer conversion initializing 'int' with an expression of type 'void *' [-Wint-conversion]
    error: command '/opt/homebrew/bin/clang' failed with exit code 1
Error installing codeqai.
EasyTop commented 8 months ago

I had to modify codeparser.py to add encoding='utf-8' to "with open(code_file, "r", encoding='utf-8')" to get it to work on windows

faiss-gpu is not supported on windows,

Its easier and faster to get working on wsl2, ollama works well, llamacpp is broken.

pretty handy lil program

fynnfluegge commented 8 months ago

@EasyTop Thank you, are you so nice to raise this as a PR? πŸ™‚ All Windows users will appreciate this!

fynnfluegge commented 8 months ago

I would like to pick up all points here and will collect them into a troubleshooting section in Readme.md.

The encoding error should be fixable with a PR.

johanvts commented 7 months ago

I tried this on windows and pipx says it successfully installed codeqai, it also creates a shortcut to place codeqai in path, but the appointed location is empty. There isn't even a pipx folder in my AppData.

fynnfluegge commented 7 months ago

Updated the Installation section https://github.com/fynnfluegge/codeqai?tab=readme-ov-file#-installation and also added a Troubleshooting section. Hope that helps!

strawberrymelonpanda commented 7 months ago

Feels like enough of this has been addressed through PRs and README updates that it's time to close it. πŸ‘

More specific issues should probably be created for anything outstanding.