Open BlackWyvern opened 1 year ago
1) why are you monitoring GPU memory, when it's your RAM being drained. 2) how much RAM do you have? 3) which two models are you trying to merge? I can barely merge hassanblend with something normal, but trying to merge pfg with some other model based on pfg is impossible for me, because of 7gb size.
Ok, so you checked RAM usage too, but it wouldnt be shown on the monitor, because memory was never allocated. it tried to allocate some space required to load second model, and when it turned out to be too much, exception was thrown. If it was still from the very beginning to the end, that's very suspicious. At least one model had to fit.
It is totally possible that something was broken, but first we need to make sure that what you're merging now was possible before, and that your available RAM is enough to perform merge.
I've got 32G of ram, which is usually barely over 11 in use. Been trying to merge things like AnalogDiffusion (~2G), ArtofMTG (~2G), Anything3 (~2G), SD1.5 (~4G), FurryE15/18 (~4G) All of which I've been able to make successful merges of since merging was a thing.
Yep, definitely something wrong then. 32G should be more than enough. I cannot reproduce it yet, sadly. As i said, merged hassanblend (~6gb) with sd1-5(~4gb) and it was fine on last commit. Can you send your RAM usage monitor readings? Or just tell if they're completely flat through entire process, from start to exception? Oh, btw, i just noticed it happens in the middle of the process, that's weird.
Merging... 53%|████████████████████████████████████████████████ | 966/1831 [00:10<00:09, 88.45it/s] Error loading/saving model file: Traceback (most recent call last): File "F:\Downloads\Stable Diffusion\A1 SD\stable-diffusion-webui\modules\ui.py", line 1758, in modelmerger results = modules.extras.run_modelmerger(*args) File "F:\Downloads\Stable Diffusion\A1 SD\stable-diffusion-webui\modules\extras.py", line 321, in run_modelmerger theta_0[key] = theta_func2(a, b, multiplier) File "F:\Downloads\Stable Diffusion\A1 SD\stable-diffusion-webui\modules\extras.py", line 256, in weighted_sum return ((1 - alpha) * theta0) + (alpha * theta1) RuntimeError: [enforce fail at ..\c10\core\impl\alloc_cpu.cpp:81] data. DefaultCPUAllocator: not enough memory: you tried to allocate 26214400 bytes.
It initially merged two models correctly, but after running a couple txt2img and img2img instances, it immediately goes back to doing it.
Error loading/saving model file: Traceback (most recent call last): File "F:\Downloads\Stable Diffusion\A1 SD\stable-diffusion-webui\modules\ui.py", line 1689, in modelmerger results = modules.extras.run_modelmerger(*args) File "F:\Downloads\Stable Diffusion\A1 SD\stable-diffusion-webui\modules\extras.py", line 379, in run_modelmerger safetensors.torch.save_file(theta_0, output_modelname, metadata={"format": "pt"}) File "F:\Downloads\Stable Diffusion\A1 SD\stable-diffusion-webui\venv\lib\site-packages\safetensors\torch.py", line 71, in save_file serialize_file(_flatten(tensors), filename, metadata=metadata) File "F:\Downloads\Stable Diffusion\A1 SD\stable-diffusion-webui\venv\lib\site-packages\safetensors\torch.py", line 236, in _flatten return { File "F:\Downloads\Stable Diffusion\A1 SD\stable-diffusion-webui\venv\lib\site-packages\safetensors\torch.py", line 240, in <dictcomp> "data": _tobytes(v, k), File "F:\Downloads\Stable Diffusion\A1 SD\stable-diffusion-webui\venv\lib\site-packages\safetensors\torch.py", line 210, in _tobytes return data.tobytes() MemoryError
Getting this new error on 45a8b758a7bcb144242aee710dfcd1aedcf30b7f
If it merges, which is still big if, it then gives this error and halts. Only seems to apply to safetensor though. If it's feeling like merging, it can merge as checkpoint fine.
I've fully clean reinstalled my nvidia drivers, rebuilt the repo, and ran both memtest and windows memory checks.
A tentative fix so far, apparently my pagefile was to blame. Windows did that thing where it makes a pagefile and promptly forgets to use it. I deleted it and refreshed the settings, so far I've been able to merge several models in both ckpt and safetensor.
Will keep an eye on it for the moment.
@BlackWyvern I'm having the same issue, where is the pagefile so I can delete it and does that refresh the settings or how else would I do that afterward?
(I'm on W10 so, YMMV) On the start menu, right click on This PC, scroll down to advanced system settings, advanced tab, performance, advanced, virtual memory.
Set everything to none, restart, go back in, set it to system managed, or give it a few gigs manually, restart again.
It's a windows thing and doesn't affect Auto
I'm having the same issue too.
same issue
From today (I've upgraded) I got the same error. Since yesterday it work perfectly!
Is there an existing issue for this?
What happened?
As of the updates two days ago, I am completely unable to merge models now. Before, I could run off and test stuff all day without an issue. More over, as it appears to be giving me an OOM error, I've been monitoring my VRAM, and finding that the memory usage doesn't even move. It's as if it doesn't allocate it at all. Looking at the error, it seems to be a CPU/RAM issue? But that doesn't seem to move either.
Merging... 18%|████████████████▎ | 331/1831 [00:01<00:08, 175.52it/s] Error loading/saving model file: Traceback (most recent call last): File "F:\Downloads\Stable Diffusion\A1 SD\stable-diffusion-webui\modules\ui.py", line 1731, in modelmerger results = modules.extras.run_modelmerger(*args) File "F:\Downloads\Stable Diffusion\A1 SD\stable-diffusion-webui\modules\extras.py", line 321, in run_modelmerger theta_0[key] = theta_func2(a, b, multiplier) File "F:\Downloads\Stable Diffusion\A1 SD\stable-diffusion-webui\modules\extras.py", line 256, in weighted_sum return ((1 - alpha) * theta0) + (alpha * theta1) RuntimeError: [enforce fail at ..\c10\core\impl\alloc_cpu.cpp:81] data. DefaultCPUAllocator: not enough memory: you tried to allocate 4718592 bytes.
I'm running a 3080 on W10 and have had no issues until these last couple days of commits.
Steps to reproduce the problem
What should have happened?
Received bacon.
Commit where the problem happens
874b975bf8438b2b5ee6d8540d63b2e2da6b8dbd
What platforms do you use to access UI ?
Windows
What browsers do you use to access the UI ?
Mozilla Firefox
Command Line Arguments
Additional information, context and logs
No response