I'm running my ComfyUI as a server. I'm using the exact same workflow, and the only thing that changes is the model. Even though I'm using --highvram, the model is still partially reloading when I swap checkpoints.
It adds about a 2-4 second overhead when context switching between models.
Is there a way to keep my models 100% in VRAM so there is no overhead to context switching?
Here's an example generation without switching the Model:
When I change the model, I expect it to really quickly be able to swap from VRAM but I get:
model_type EPS
Using pytorch attention in VAE
Using pytorch attention in VAE
clip missing: ['clip_l.logit_scale', 'clip_l.transformer.text_projection.weight']
loaded straight to GPU
Requested to load BaseModel
Loading 1 new model
lora key not loaded: lora_te_text_model_encoder_layers_0_mlp_fc1.alpha
...
Requested to load SD1ClipModel
Loading 1 new model
Requested to load SD1ClipModel
Loading 1 new model
Requested to load BaseModel
Loading 1 new model
100%|█████████████████████████████████████████████████████████| 5/5 [00:02<00:00, 2.44it/s]
Requested to load AutoencoderKL
Loading 1 new model
Prompt executed in 4.69 seconds
I'm running my ComfyUI as a server. I'm using the exact same workflow, and the only thing that changes is the model. Even though I'm using --highvram, the model is still partially reloading when I swap checkpoints.
It adds about a 2-4 second overhead when context switching between models.
Is there a way to keep my models 100% in VRAM so there is no overhead to context switching?
Here's an example generation without switching the Model:
When I change the model, I expect it to really quickly be able to swap from VRAM but I get: