LostRuins / koboldcpp

A simple one-file way to run various GGML and GGUF models with KoboldAI's UI
https://github.com/lostruins/koboldcpp
GNU Affero General Public License v3.0
4.34k stars 310 forks source link

SDUI width/height parameters having no effect #919

Closed CorentinWicht closed 2 weeks ago

CorentinWicht commented 2 weeks ago

Dear @LostRuins,

You have made fantastic work integrating the Stable UI, many thanks for that!

I have been playing around lately, trying out different models (some working ✔, some not ✘):

Strangely enough, whatever model I load the width & height parameters seem not to have any effect.

Here is a sample image generated with width & height set to 1024: image

While in reality the picture is still in 512: image

Do you have any idea why this is happening?

Best,

C.

LostRuins commented 2 weeks ago

Are you by any chance running with --sdclamped? (Or previously, the clamped flag with the deprecated --sdconfig). That prevents images larger than 512x512 from being generated, to prevent a shared server from being crashed by someone requesting a massive image.

LostRuins commented 2 weeks ago

Also for those broken models, have you tried enabling the new "Fix Bad VAE" option? In the GUI it's this option image while from command line you can use --sdvaeauto

Remember that --sdconfig is deprecated - make sure you're using the new flags if running from CLI (Run with --help to see)

CorentinWicht commented 2 weeks ago

Are you by any chance running with --sdclamped? (Or previously, the clamped flag with the deprecated --sdconfig). That prevents images larger than 512x512 from being generated, to prevent a shared server from being crashed by someone requesting a massive image.

Indeed, you are correct I was running with: --sdconfig /opt/A1111/RealisticVisionV60B1_v51HyperVAE.safetensors clamped 4 quant

I have thus replaced my config with: --sdmodel /opt/A1111/RealisticVisionV60B1_v51HyperVAE.safetensors --sdquant

Unfortunately, now I am facing a critical issue whenever I try to generate a 1024p image (while it works smoothly for 512p ones):

Jun 13 13:40:02 frillm.XXX.ch python[46695]: Generating Image (30 steps)
Jun 13 13:40:02 frillm.XXX.ch python[46695]: ggml_backend_cuda_buffer_type_alloc_buffer: allocating 8360.93 MiB on device 0: cudaMalloc failed: out of memory
Jun 13 13:40:02 frillm.XXX.ch python[46695]: ggml_gallocr_reserve_n: failed to allocate CUDA0 buffer of size 8767072896
Jun 13 13:40:02 frillm.XXX.ch systemd-coredump[46750]: [🡕] Process 46695 (python) of user 0 dumped core.
Jun 13 13:40:04 frillm.XXX.ch systemd[1]: koboldai.service: Main process exited, code=dumped, status=11/SEGV
Jun 13 13:40:04 frillm.XXX.ch systemd[1]: koboldai.service: Failed with result 'core-dump'.
Jun 13 13:40:04 frillm.XXX.ch systemd[1]: koboldai.service: Consumed 26.136s CPU time.

I have made sure that the GPU (NVIDIA RTX A6000) has enough free-memory by reducing the --gpulayers 60 parameter: image

Could this be a failure linked to the model I am using (RealisticVisionV60B1_v51HyperVAE.safetensors)?

Also for those broken models, have you tried enabling the new "Fix Bad VAE" option? In the GUI it's this option image while from command line you can use --sdvaeauto

Remember that --sdconfig is deprecated - make sure you're using the new flags if running from CLI (Run with --help to see)

Thanks for the suggestion, I haven't tried it yet but will give it a try asap.

Best,

C.

LostRuins commented 2 weeks ago

Nope, its definitely just being out of memory:

Jun 13 13:40:02 frillm.XXX.ch python[46695]: ggml_backend_cuda_buffer_type_alloc_buffer: allocating 8360.93 MiB on device 0: cudaMalloc failed: out of memory
Jun 13 13:40:02 frillm.XXX.ch python[46695]: ggml_gallocr_reserve_n: failed to allocate CUDA0 buffer of size 8767072896

It tried to allocate 8.7GB required to process that 1024x1024 image, looks like it was not able to.

Perhaps you could try this: Don't offload the text model to VRAM first - and just generate a 1024x1024 image, and observe how much VRAM gets used by SD alone. Then, adjust the gpu layers such that you can fit both models at the same time. Alternatively, you can try a slightly smaller resolution, a good middle ground may be 768x768 which should work well.

CorentinWicht commented 2 weeks ago

Nope, its definitely just being out of memory:

Jun 13 13:40:02 frillm.XXX.ch python[46695]: ggml_backend_cuda_buffer_type_alloc_buffer: allocating 8360.93 MiB on device 0: cudaMalloc failed: out of memory
Jun 13 13:40:02 frillm.XXX.ch python[46695]: ggml_gallocr_reserve_n: failed to allocate CUDA0 buffer of size 8767072896

It tried to allocate 8.7GB required to process that 1024x1024 image, looks like it was not able to.

Perhaps you could try this: Don't offload the text model to VRAM first - and just generate a 1024x1024 image, and observe how much VRAM gets used by SD alone. Then, adjust the gpu layers such that you can fit both models at the same time. Alternatively, you can try a slightly smaller resolution, a good middle ground may be 768x768 which should work well.

Thanks for the prompt reply.

I have adjusted my running parameters accordingly --gpulayers 0 so that most of the GPU memory is free: image

Nevermind, when I try again to generate a 1024 image, koboldcpp crashed with another error:

Jun 13 15:48:45 frillm.XXX.ch python[51120]: Generating Image (30 steps)
Jun 13 15:48:45 frillm.XXX.ch python[51120]: ggml_cuda_compute_forward: SCALE failed
Jun 13 15:48:45 frillm.XXX.ch python[51120]: CUDA error: invalid configuration argument
Jun 13 15:48:45 frillm.XXX.ch python[51120]:   current device: 0, in function ggml_cuda_compute_forward at ggml-cuda.cu:2366
Jun 13 15:48:45 frillm.XXX.ch python[51120]:   err
Jun 13 15:48:45 frillm.XXX.ch python[51120]: GGML_ASSERT: ggml-cuda.cu:102: !"CUDA error"
Jun 13 15:48:45 frillm.XXX.ch python[51161]: gdb: warning: Couldn't determine a path for the index cache directory.
Jun 13 15:48:45 frillm.XXX.ch python[51161]: [New LWP 51122]
Jun 13 15:48:45 frillm.XXX.ch python[51161]: [New LWP 51123]
Jun 13 15:48:45 frillm.XXX.ch python[51161]: [New LWP 51124]
Jun 13 15:48:45 frillm.XXX.ch python[51161]: [New LWP 51140]
Jun 13 15:48:45 frillm.XXX.ch python[51161]: [New LWP 51141]
Jun 13 15:48:45 frillm.XXX.ch python[51161]: [New LWP 51142]
Jun 13 15:48:45 frillm.XXX.ch python[51161]: [New LWP 51143]
Jun 13 15:48:45 frillm.XXX.ch python[51161]: [New LWP 51144]
Jun 13 15:48:45 frillm.XXX.ch python[51161]: [New LWP 51145]
Jun 13 15:48:45 frillm.XXX.ch python[51161]: [New LWP 51146]
Jun 13 15:48:45 frillm.XXX.ch python[51161]: [New LWP 51147]
Jun 13 15:48:45 frillm.XXX.ch python[51161]: [New LWP 51148]
Jun 13 15:48:45 frillm.XXX.ch python[51161]: [New LWP 51149]
Jun 13 15:48:45 frillm.XXX.ch python[51161]: [New LWP 51150]
Jun 13 15:48:45 frillm.XXX.ch python[51161]: [New LWP 51151]
Jun 13 15:48:45 frillm.XXX.ch python[51161]: [New LWP 51152]
Jun 13 15:48:45 frillm.XXX.ch python[51161]: [New LWP 51153]
Jun 13 15:48:45 frillm.XXX.ch python[51161]: [New LWP 51154]
Jun 13 15:48:45 frillm.XXX.ch python[51161]: [New LWP 51155]
Jun 13 15:48:45 frillm.XXX.ch python[51161]: [New LWP 51156]
Jun 13 15:48:45 frillm.XXX.ch python[51161]: [New LWP 51157]
Jun 13 15:48:45 frillm.XXX.ch python[51161]: [New LWP 51158]
Jun 13 15:48:45 frillm.XXX.ch python[51161]: [New LWP 51159]
Jun 13 15:48:45 frillm.XXX.ch python[51161]: [Thread debugging using libthread_db enabled]
Jun 13 15:48:45 frillm.XXX.ch python[51161]: Using host libthread_db library "/lib64/libthread_db.so.1".
Jun 13 15:48:45 frillm.XXX.ch python[51161]: 0x00007f69cbb0422d in select () from /lib64/libc.so.6
Jun 13 15:48:45 frillm.XXX.ch python[51161]: #0  0x00007f69cbb0422d in select () from /lib64/libc.so.6
Jun 13 15:48:45 frillm.XXX.ch python[51161]: #1  0x00007f69cbffa61d in time_sleep () from /lib64/libpython3.9.so.1.0
Jun 13 15:48:45 frillm.XXX.ch python[51161]: #2  0x00007f69cbf1bd87 in cfunction_vectorcall_O () from /lib64/libpython3.9.so.1.0
Jun 13 15:48:45 frillm.XXX.ch python[51161]: #3  0x00007f69cbf14a74 in _PyEval_EvalFrameDefault () from /lib64/libpython3.9.so.1.0
Jun 13 15:48:45 frillm.XXX.ch python[51161]: #4  0x00007f69cbf0e99d in _PyEval_EvalCode () from /lib64/libpython3.9.so.1.0
Jun 13 15:48:45 frillm.XXX.ch python[51161]: #5  0x00007f69cbf1c235 in _PyFunction_Vectorcall () from /lib64/libpython3.9.so.1.0
Jun 13 15:48:45 frillm.XXX.ch python[51161]: #6  0x00007f69cbf0fdd9 in _PyEval_EvalFrameDefault () from /lib64/libpython3.9.so.1.0
Jun 13 15:48:45 frillm.XXX.ch python[51161]: #7  0x00007f69cbf0e99d in _PyEval_EvalCode () from /lib64/libpython3.9.so.1.0
Jun 13 15:48:45 frillm.XXX.ch python[51161]: #8  0x00007f69cbf1c235 in _PyFunction_Vectorcall () from /lib64/libpython3.9.so.1.0
Jun 13 15:48:45 frillm.XXX.ch python[51161]: #9  0x00007f69cbf10d9e in _PyEval_EvalFrameDefault () from /lib64/libpython3.9.so.1.0
Jun 13 15:48:45 frillm.XXX.ch python[51161]: #10 0x00007f69cbf0e99d in _PyEval_EvalCode () from /lib64/libpython3.9.so.1.0
Jun 13 15:48:45 frillm.XXX.ch python[51161]: #11 0x00007f69cbf88d65 in _PyEval_EvalCodeWithName () from /lib64/libpython3.9.so.1.0
Jun 13 15:48:45 frillm.XXX.ch python[51161]: #12 0x00007f69cbf88cfd in PyEval_EvalCodeEx () from /lib64/libpython3.9.so.1.0
Jun 13 15:48:45 frillm.XXX.ch python[51161]: #13 0x00007f69cbf88caf in PyEval_EvalCode () from /lib64/libpython3.9.so.1.0
Jun 13 15:48:45 frillm.XXX.ch python[51161]: #14 0x00007f69cbfb92b4 in run_eval_code_obj () from /lib64/libpython3.9.so.1.0
Jun 13 15:48:45 frillm.XXX.ch python[51161]: #15 0x00007f69cbfb5106 in run_mod () from /lib64/libpython3.9.so.1.0
Jun 13 15:48:45 frillm.XXX.ch python[51161]: #16 0x00007f69cbe8a86a in pyrun_file.cold () from /lib64/libpython3.9.so.1.0
Jun 13 15:48:45 frillm.XXX.ch python[51161]: #17 0x00007f69cbfaee53 in PyRun_SimpleFileExFlags () from /lib64/libpython3.9.so.1.0
Jun 13 15:48:45 frillm.XXX.ch python[51161]: #18 0x00007f69cbfab6b8 in Py_RunMain () from /lib64/libpython3.9.so.1.0
Jun 13 15:48:45 frillm.XXX.ch python[51161]: #19 0x00007f69cbf7b69d in Py_BytesMain () from /lib64/libpython3.9.so.1.0
Jun 13 15:48:45 frillm.XXX.ch python[51161]: #20 0x00007f69cba29590 in __libc_start_call_main () from /lib64/libc.so.6
Jun 13 15:48:45 frillm.XXX.ch python[51161]: #21 0x00007f69cba29640 in __libc_start_main_impl () from /lib64/libc.so.6
Jun 13 15:48:45 frillm.XXX.ch python[51161]: #22 0x0000561623e33095 in _start ()
Jun 13 15:48:45 frillm.XXX.ch python[51161]: [Inferior 1 (process 51120) detached]
Jun 13 15:48:45 frillm.XXX.ch systemd-coredump[51199]: [🡕] Process 51120 (python) of user 0 dumped core.
Jun 13 15:48:52 frillm.XXX.ch systemd[1]: koboldai.service: Main process exited, code=dumped, status=6/ABRT
Jun 13 15:48:52 frillm.XXX.ch systemd[1]: koboldai.service: Failed with result 'core-dump'.
Jun 13 15:48:52 frillm.XXX.ch systemd[1]: koboldai.service: Consumed 57.203s CPU time.

Even though there is plenty of GPU memory left: image

For your information, 768x768 works nicely even with --gpulayers 70

Best,

C.

LostRuins commented 2 weeks ago

Hmm, that's a different error, its not OOM. This only happens at 1024x1024? Does it happen all the time?

Unfortunately I cant test at that resolution as I don't have enough VRAM to generate at 1024x1024.

CorentinWicht commented 2 weeks ago

Hmm, that's a different error, its not OOM. This only happens at 1024x1024? Does it happen all the time?

Unfortunately I cant test at that resolution as I don't have enough VRAM to generate at 1024x1024.

Indeed that's quite weird and it only happens when I set the 1024x1024 size. It works nicely up until 960x960.

In the mean-time, is there a way to limit the generation to 960x960 to avoid one of our user crashing the system? Also what happens when two people generate images simulateneously? Do the requests get queued (as in the text UI) so as not to crash the system?

Best,

C.

LostRuins commented 2 weeks ago

Yes, the requests will get queued. It won't crash, but it is possible for one user to spam many requests and hog up the system, causing others to have to wait very long until the queue clears. You can reduce that impact with a lower value for --multiuser

As for the resolution limit, right now there's no way to set it. But in future, I'll probably add an optional parameter when you clamp the resolution it to be able to specify a max image size to clamp to.