Model is still re-loading even though --highvram is turned on

I'm running my ComfyUI as a server. I'm using the exact same workflow, and the only thing that changes is the model. Even though I'm using --highvram, the model is still partially reloading when I swap checkpoints.

It adds about a 2-4 second overhead when context switching between models.

Is there a way to keep my models 100% in VRAM so there is no overhead to context switching?

Here's an example generation without switching the Model:

got prompt
100%|█████████████████████████████████████████████████████████| 5/5 [00:02<00:00,  2.42it/s]
Prompt executed in 2.79 seconds

When I change the model, I expect it to really quickly be able to swap from VRAM but I get:


model_type EPS
Using pytorch attention in VAE
Using pytorch attention in VAE
clip missing: ['clip_l.logit_scale', 'clip_l.transformer.text_projection.weight']
loaded straight to GPU
Requested to load BaseModel
Loading 1 new model
lora key not loaded: lora_te_text_model_encoder_layers_0_mlp_fc1.alpha
...
Requested to load SD1ClipModel
Loading 1 new model
Requested to load SD1ClipModel
Loading 1 new model
Requested to load BaseModel
Loading 1 new model
100%|█████████████████████████████████████████████████████████| 5/5 [00:02<00:00,  2.44it/s]
Requested to load AutoencoderKL
Loading 1 new model
Prompt executed in 4.69 seconds

comfyanonymous / ComfyUI

Model is still re-loading even though --highvram is turned on #3423