LostRuins / koboldcpp

Run GGUF models easily with a KoboldAI UI. One File. Zero Install.
https://github.com/lostruins/koboldcpp
GNU Affero General Public License v3.0
4.66k stars 334 forks source link

Crashes on Windows when importing model #15

Closed MKwareContributions closed 1 year ago

MKwareContributions commented 1 year ago

I run koboldcpp.exe, wait till it asks to import model and after selecting model it just crashes with these logs:

logs

I am running Windows 8.1 with 8 GB of RAM and 6014 MB of VRAM (according to dxdiag). What do I do?

LostRuins commented 1 year ago

Hmm this looks like you may be out of RAM during the load attempt. It may be a struggle to run with only 8GB of RAM. I have released a new version which should use slightly less RAM, can you please try it again?

MKwareContributions commented 1 year ago

Still crashes with the same error. Should I get more RAM?

LostRuins commented 1 year ago

@MKware00 you could, or how about trying a smaller model first? Here is a tiny one that should definitely work on most devices. https://huggingface.co/ggerganov/ggml/resolve/main/ggml-model-gpt-2-117M.bin

You need at least version 1.0.9beta. Download the newest version of KoboldCPP here: https://github.com/LostRuins/koboldcpp/releases/latest

and let me know if this one also crashes, then it might be a separate issue...

MKwareContributions commented 1 year ago

Still crashes and the same error.

LostRuins commented 1 year ago

Then I'm not so sure, could be some sort of problem when reading the file directly. Unfortunately, it will be quite difficult to debug this remotely. All I can suggest for now is to either try copy the file and program to a different drive, or a different device, and try again.

MKwareContributions commented 1 year ago

Tried redownloading both koboldcpp.exe and the model you provided to another drive, no luck.

wattsinaname commented 1 year ago

I am also running into this issue: Traceback (most recent call last): File "koboldcpp.py", line 369, in File "koboldcpp.py", line 326, in main File "koboldcpp.py", line 64, in load_model OSError: [WinError -1073741795] Windows Error 0xc000001d [8000] Failed to execute script 'koboldcpp' due to unhandled exception!

but with 32GB of quad channel RAM, 8GB of VRAM running on 7 threads. Ive tried with --noblas, requantizing the original weights I'm running the most recent version. I've also been able to run the the opt 2.7B, GPT4all and AutoGPT models locally, I just really love your idea of ease of use for multiple ggml models and would honestly prefer to use kobold.

TheWandering514 commented 1 year ago

I also have the very same OSError (no models working here on the latest koboldcpp.exe v1.1, all throwing the same error), and I have a suspicion as to what might be causing the issue, although I don't know for sure for the other people running into this problem in this thread.

I have a very dated CPU that doesn't support all the instruction sets mentioned in the Makefile, such as AVX2 and FMA - maybe the illegal instructions are being caused by the missing instruction sets. The comments in the Makefile did suggest that the flags might need to be tweaked on some architectures, and I'm getting the feeling my architecture might be one of those that requires such tweaks. Unfortunately I don't have the knowhow to try rebuilding the exe to better fit my hardware to test whether my suspicions were correct. image

LostRuins commented 1 year ago

Ah yes, so it does seem like very old CPUs without avx2 fail to work, and it doesn't do runtime checks like I thought. I will see if I can somehow get it to work automatically.

LostRuins commented 1 year ago

Hi @TheWandering514 @wattsinaname @MKware00 I have made a test compatibility build that does not use any AVX2 build flags. Can you try it?

https://github.com/LostRuins/koboldcpp/releases/download/v1.1/koboldcpp_noavx2.exe

wattsinaname commented 1 year ago

Ah yes, so it does seem like very old CPUs without avx2 fail to work, and it doesn't do runtime checks like I thought. I will see if I can somehow get it to work automatically.

hmmmm, wierd because my cpu has AVX2 "System Info: AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |".

I also tried the new exe: C:\Users\Tom\Desktop\koboldcpp-1.0.10\models>koboldcpp_noavx2.exe ggml-vicuna-7b-4bit-rev1.bin 5001 --noblas Initializing dynamic library: koboldcpp.dll Loading model: C:\Users\Tom\Desktop\koboldcpp-1.0.10\models\ggml-vicuna-7b-4bit-rev1.bin [Parts: 1, Threads: 7]


Identified as LLAMA model: (ver 3) Attempting to Load...

System Info: AVX = 1 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | llama_model_load: loading model from 'C:\Users\Tom\Desktop\koboldcpp-1.0.10\models\ggml-vicuna-7b-4bit-rev1.bin' - please wait ... llama_model_load: n_vocab = 32001 llama_model_load: n_ctx = 2048 llama_model_load: n_embd = 4096 llama_model_load: n_mult = 256 llama_model_load: n_head = 32 llama_model_load: n_layer = 32 llama_model_load: n_rot = 128 llama_model_load: f16 = 2 llama_model_load: n_ff = 11008 llama_model_load: n_parts = 1 llama_model_load: type = 1 llama_model_load: ggml map size = 4017.70 MB llama_model_load: ggml ctx size = 81.25 KB llama_model_load: mem required = 5809.78 MB (+ 1026.00 MB per state) Traceback (most recent call last): File "koboldcpp.py", line 379, in File "koboldcpp.py", line 333, in main File "koboldcpp.py", line 64, in load_model OSError: [WinError -1073741795] Windows Error 0xc000001d [9020] Failed to execute script 'koboldcpp' due to unhandled exception!

I'm using an older XEON server cpu, e5-2689, that could be the issue? I really appreciate you trying though! The AI dev community is awesome.

TheWandering514 commented 1 year ago

Hi @TheWandering514 @wattsinaname @MKware00 I have made a test compatibility build that does not use any AVX2 build flags. Can you try it?

https://github.com/LostRuins/koboldcpp/releases/download/v1.1/koboldcpp_noavx2.exe

This build works on my architecture, thank you.

LostRuins commented 1 year ago

@wattsinaname That is very strange indeed - if you are already able to run llama.cpp and gpt4all.cpp locally then this should not be an issue - after all they are using the same libraries!

MKwareContributions commented 1 year ago

Hi @TheWandering514 @wattsinaname @MKware00 I have made a test compatibility build that does not use any AVX2 build flags. Can you try it?

https://github.com/LostRuins/koboldcpp/releases/download/v1.1/koboldcpp_noavx2.exe

Worked for me too, sorry for late reply.

P.S. Works in the Kobold, but when trying to connect to it from Tavern, Tavern crashes with error, but that's probably for another issue.

LostRuins commented 1 year ago

Okay from the general response this seems to have solved most issues. In future releases I'll try to include a noavx build too, but it may not be in every single release (it is in the latest one now though, as of version 1.2). Closing this issue for now - please do open a new one if there are new problems!

Concentum commented 1 year ago

Okay from the general response this seems to have solved most issues. In future releases I'll try to include a noavx build too, but it may not be in every single release (it is in the latest one now though, as of version 1.2). Closing this issue for now - please do open a new one if there are new problems!

Thanks a lot for your great work! If it doesn't bother you too much, I would ask you to optionally do without avx and without avx2 I understand that this probably does not make sense due to the low speed, but I really want to try. Or tell me how you can build the executable yourself without using avx and avx2.

LostRuins commented 1 year ago

If you want to try, simply edit the makefile and remove all references to -mavx2 -mfma -mavx -mf16c -msse3 , it should work but it will be horribly slow.

The latest windows release comes with a toggle to disable AVX2, as that one seems to have the most compatibility issues.

Concentum commented 1 year ago

Если вы хотите попробовать, просто отредактируйте make-файл и удалите все ссылки на -mavx2 -mfma -mavx -mf16c -msse3, он должен работать, но будет ужасно медленным.

В последнем выпуске Windows есть переключатель для отключения AVX2, так как у него больше всего проблем с совместимостью.

Thanks for the reply.

Of course, I tried to remove all mentions of these flags from the makefile. But as it seemed to me, this did not give a visible result, since I again received an error and the output was "System Info: AVX = 1 |...." Maybe I'm not building the exe file correctly. I do this using make_pyinstaller.bat

LostRuins commented 1 year ago

make_pyinstaller is not the makefile. That one just packages all the existing dll files into one exe. You need to rebuild the .dll files, which you can do with the make command.