LostRuins / koboldcpp

Run GGUF models easily with a KoboldAI UI. One File. Zero Install.
https://github.com/lostruins/koboldcpp
GNU Affero General Public License v3.0
4.66k stars 334 forks source link

Cannot load model with weird error (access violation reading 0x0000000000000070) #714

Closed zuGROB closed 5 months ago

zuGROB commented 5 months ago

So, nothing, absolutely nothing is using any of my GPUs, and somehow it says "allocating 18390.70 MiB on device 0: cudaMalloc failed: out of memory"

I have two cards in my system, 3090 and Tesla P40, and this error happens on both cards. Interestingly enough, it won't happen after reboot, but it's kinda infuriating at some point.

Launch parameters in any case: multiuser=1, noavx2=False, noblas=False, nocertify=False, nommap=True, noshift=False, onready='', port=5001, port_param=5001, preloadstory=None, quiet=False, remotetunnel=False, ropeconfig=[0.0, 10000.0], skiplauncher=False, smartcontext=False, ssl=None, tensor_split=None, threads=11, useclblast=None, usecublas=['normal', '1', 'mmq'], usemlock=True, usevulkan=None

zuGROB commented 5 months ago

Partially fixed. When Tesla IS IN the system 3090 refuses to work with BLAS batch higher than 128, but if Tesla is unplugged - 512 works just fine with my model. Stuff's weird. And BLAS batch 512 works fine on the Tesla card, even though it has same 24 gigs ov VRAM... like, whut

LostRuins commented 5 months ago

Next time you can adjust the memory split between the cards with --tensor_split