Juqowel / GPU_For_T5

11 stars 1 forks source link

Problem with RAM usage #3

Open Arnaud3013 opened 1 month ago

Arnaud3013 commented 1 month ago

Really usefull extensions. On dev nf4 (RTX 4070, setting 9500 max vram for model) Great speedup if no model change between generation. previously:

100%|████████████████████████████████████████████████████████████████████| 5/5 [00:07<00:00, 1.60s/it] Skipping unconditional conditioning when CFG = 1. Negative Prompts are ignored.██████████| 5/5 [00:07<00:00, 1.29s/it] [Unload] Trying to free 15315.57 MB for cuda:0 with 0 models keep loaded ... Current free memory is 4472.06 MB ... Unload model KModel Current free memory is 10860.76 MB ... Unload model IntegratedAutoencoderKL Done. [Memory Management] Target: JointTextEncoder, Free GPU: 11020.64 MB, Model Require: 9641.98 MB, Previously Loaded: 0.00 MB, Inference Require: 2781.00 MB, Remaining: -1402.34 MB, CPU Swap Loaded (blocked method): 2886.75 MB, GPU Loaded: 6899.98 MB Moving model(s) has taken 3.62 seconds Distilled CFG Scale: 3.5 [Unload] Trying to free 10901.84 MB for cuda:0 with 0 models keep loaded ... Current free memory is 3613.65 MB ... Unload model JointTextEncoder Done. [Memory Management] Target: KModel, Free GPU: 11015.63 MB, Model Require: 6246.80 MB, Previously Loaded: 0.00 MB, Inference Require: 2781.00 MB, Remaining: 1987.83 MB, All loaded to GPU. Moving model(s) has taken 3.24 seconds 100%|████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:07<00:00, 1.47s/it]

Now:

100%|████████████████████████████████████████████████████████████████████| 5/5 [00:08<00:00, 1.76s/it] Skipping unconditional conditioning when CFG = 1. Negative Prompts are ignored.██████████| 5/5 [00:08<00:00, 1.30s/it] [Unload] Trying to free 10901.84 MB for cuda:0 with 0 models keep loaded ... Current free memory is 10352.01 MB ... Current free memory is 10352.01 MB ... Unload model IntegratedAutoencoderKL Done. [Memory Management] Target: KModel, Free GPU: 10511.88 MB, Model Require: 6246.80 MB, Previously Loaded: 0.00 MB, Inference Require: 2781.00 MB, Remaining: 1484.08 MB, All loaded to GPU. Moving model(s) has taken 1.41 seconds 100%|████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:07<00:00, 1.47s/it]

On dev Q8 gguf (same gpu) Previously:

100%|████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:11<00:00, 2.23s/it] [Unload] Trying to free 15315.57 MB for cuda:0 with 0 models keep loaded ... Current free memory is 10861.01 MB ... Unload model IntegratedAutoencoderKL Done. [Memory Management] Target: JointTextEncoder, Free GPU: 11020.89 MB, Model Require: 9641.98 MB, Previously Loaded: 0.00 MB, Inference Require: 2781.00 MB, Remaining: -1402.09 MB, CPU Swap Loaded (blocked method): 2886.75 MB, GPU Loaded: 6899.98 MB Moving model(s) has taken 1.75 seconds Distilled CFG Scale: 3.5 [Unload] Trying to free 18536.36 MB for cuda:0 with 0 models keep loaded ... Current free memory is 3613.90 MB ... Unload model JointTextEncoder Done. [Memory Management] Target: KModel, Free GPU: 11015.88 MB, Model Require: 12119.51 MB, Previously Loaded: 0.00 MB, Inference Require: 2781.00 MB, Remaining: -3884.63 MB, CPU Swap Loaded (blocked method): 5202.00 MB, GPU Loaded: 6917.51 MB Moving model(s) has taken 4.48 seconds 100%|████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:11<00:00, 2.24s/it]

after:

100%|████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:11<00:00, 2.23s/it] Distilled CFG Scale: 3.5 [Unload] Trying to free 18536.36 MB for cuda:0 with 0 models keep loaded ... Current free memory is 10354.01 MB ... Unload model IntegratedAutoencoderKL Current free memory is 10513.88 MB ... Done. [Memory Management] Target: KModel, Free GPU: 10513.88 MB, Model Require: 12119.51 MB, Previously Loaded: 0.00 MB, Inference Require: 2781.00 MB, Remaining: -4386.63 MB, CPU Swap Loaded (blocked method): 5680.12 MB, GPU Loaded: 6439.38 MB Moving model(s) has taken 2.62 seconds 100%|████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:11<00:00, 2.24s/it]

But it seems to have some issues with memory management. When i change model, like in x/y/z plot and do some genration/tests, my memory explode. Vram constant no issue here, blocked at 9000 on my 4070, but RAM it's another topic. I've set some virtual ram to be sure to handle model in q8_0, I've 32go physical RAM and 60go of virtual RAM on nvme ssd. without extension no issues, sometimes RAM usage go up to 55go but not more, with extension very often it go up to 90go and crash forge. It seems when changing model, something is loaded again and again at each loading of a model and is not cleaned

Juqowel commented 1 month ago

Unfortunately I can't control the forge internal memory management. It loads each individual t5 if it is integrated and if you don't specify an external one. Use a separate CLIP/T5.

4356

I don't have any memory problems with this. If you already use it - let me know what checkpoints you use to reproduce. I also recommend avoiding the CPU Swap Loaded (blocked method) message. Q4 best for 12gb. NF4 is trash cuz too noticeable square pattern and other quality issues.

Arnaud3013 commented 1 month ago

I'm using same VAE/Text as you, except t5xxl is fp16. I'll try to use Async/shared.

Juqowel commented 1 month ago

Then maybe it's a problem with models that are too big for your GPU. I don't use models bigger than 75% of vram, usually below 50%.