Weird Segmentation Fault when Ctrl+c

TheBill2001 commented 9 months ago

Hi! I'm currently on EndeavourOS, and I'm running KoboldCpp CUDA version from the AUR.

Unless there is another way to quit KoboldCpp that I don't know of, I have always used Ctrl+C in the terminal. And every time, it takes a very long time to exit and always return with segmentation fault.

Here is the script that. I used to launch KoboldCpp:

#!/bin/bash
koboldcpp \
    --model "${PWD}/models/psyonic-cetacean-20b.Q4_K_M.gguf" \
    --highpriority \
    --contextsize 4096 \
    --smartcontext \
    --blasbatchsize 512 \
    --skiplauncher \
    --usecublas lowvram \
    --gpulayers 18 \
    --threads 12

Here is the info from systemd-coredump if it help: trace.txt. The actual coredump is over 5GB.

System:

OS: EndeavourOS (Arch-based)
CPU: AMD R5 3600X
GPU: GTX 1660 6GB
Python 3.11.6
CUDA: 12.3.1

LostRuins commented 9 months ago

This is a known issue, and a fix is being worked on.

scarygliders commented 9 months ago

Can confirm I've had this exact same bug for a long time. I use Arch linux (btw ;) ).

To @TheBill2001 , the reason it seems to take a long time for koboldcpp to eventually exit after segfaulting, is because the core dump is being written to disk - it's a large file (5GB as you point out) and takes time to get written.

What I do is to disable writing core dumps - there's a lot less delay. Then again, you wouldn't have been able to provide the core dump in your bug report ;)

LostRuins commented 9 months ago

Can you check if the latest (v1.54) release fixes this?

TheBill2001 commented 9 months ago

It is no longer crashing, at least with psyonic-cetacean-20b.Q4_K_M and pygmalion-2-7b.Q4_K_M. I have only tested those two models.

LostRuins commented 9 months ago

awesome.

scarygliders commented 9 months ago

Yep your latest wizardry has fixed the problem! Thank you.

vahook commented 9 months ago

Hi! I've also encountered this issue last week, so I decided to get to the bottom of it.

The actual problem is that kobold links together object files that have been compiled with different definitions of gpt_params. This is because LLAMA_MAX_DEVICES is defined as:

1 for non-cuda builds
GGML_CUDA_MAX_DEVICES (16 by default) for cuda builds

... and this will cause problems here: https://github.com/LostRuins/koboldcpp/blob/71a5afaab5721fd756ed57867a5c3b96b0487890/common/common.h#L57

There will exist two constructor / destructor definitions for gpt_params, and the linker will choose the wrong ones for the static var in gpttype_adapter.cpp when doing a cuda build. This can (and will in this case) lead to a segfault in the destructor due to the non-trivially destructible members having the wrong offsets.

Although this commit might have fixed this particular crash, there is definitely something wrong with the Makefile, and thus the issue should be addressed there instead IMO.

LostRuins commented 9 months ago

Thank you for your in-depth analysis @vahook . That must be why it never happens on my windows builds - the git CI I use to make the koboldcpp_cublas.dll rebuilds everything including all intermediates from scratch. Only on linux when targeting multiple build targets at once will this happen, because the common object files are reused.

To solve this in the makefile would be to compile llama.o into multiple files, one for each target backend (e.g. llama_cuda.o + llama_nocuda.o) , or to clear existing objects every time a new target is built for.

Alternatively, I can standardize the gpt_params struct for both cuda and non-cuda cases to be the same size, which seems to be the better option. Thanks for spotting this root cause!

LostRuins / koboldcpp

Weird Segmentation Fault when Ctrl+c #588