it takes IPA 4s to load a new image input

AlexFeYang commented 1 month ago

I use the --only-gpu mode on 4090, 我在 4090 上使用 --only-gpu 模式，

It requests to load SDXL everytime I load a new image as input? why? 每次我加载新图像作为输入时，它都会请求加载 SDXL？为什么？

AlexFeYang commented 1 month ago

alright, I am brainless here，I test with the basic workflow, it's not beacause of loading SDXL but I still don't know why it takes IPA 4s to load a new image. I used to use the diffusers directly to load image with IPA, it's won't take this much of time to load a new image as input

cubiq commented 1 month ago

does it do the same without --only-gpu ?

AlexFeYang commented 1 month ago

I switch --highvram this time, nothing changes still spend another 4s on ipadapter

cubiq commented 1 month ago

try with standard vram management

AlexFeYang commented 1 month ago

still, i am using normalvarm this time. when I change a image at image loader, it will spend another 4s

cubiq commented 1 month ago

can you post the image and the workflow?

AlexFeYang commented 1 month ago

I am using animagine

cubiq commented 1 month ago

can you try to pass the reference to prepimageforclipvision?

cubiq commented 1 month ago

NOTE: if you use the unified loader DO NOT use the load clip vision node!

AlexFeYang commented 1 month ago

no help

AlexFeYang commented 1 month ago

is there anything to load inside the IPAdapter nodes? I am using cloud pod now. my own computer, 4060 16g will take 2s to load a new image

cubiq commented 1 month ago

I don't know why it takes so long. on my 4090 loading one image is almost instantaneous

AlexFeYang commented 1 month ago

Yes, I used to work with diffusers, and it loads image within sencond, that's why I am so comfused now.

AlexFeYang commented 1 month ago

This case is reproducible. Every Pod I start is like this, even in Docker. Only things here is that they all need to link to the model storage, so is there anything i should prepare of? I thought the ipa model and the clip vision and the image has been loaded ahead, so what's missing?

AlexFeYang commented 1 month ago

I turn the weight from 0.8 to 0.7, it's start to load sdxl and a new model

AlexFeYang commented 1 month ago

I add logging to your code it seems this code take most of the time, any thoughts? @cubiq ''' self.ip_layers = To_KV(ipadapter_model["ip_adapter"]) '''

cubiq commented 1 month ago

are you in fp32?

AlexFeYang commented 1 month ago

nope, I am using bfp16 by default, and I alse tested it with fp16, nothing helps,

for key, value in state_dict.items():
            self.to_kvs[key.replace(".weight", "").replace(".", "_")] = nn.Linear(value.shape[1], value.shape[0], bias=False)
            self.to_kvs[key.replace(".weight", "").replace(".", "_")].weight.data = value

this is the code costing so much time, I don't know what to do with it.

AlexFeYang commented 1 month ago

Still need help sir! much appreciated!! @cubiq

cubiq commented 1 month ago

I still don't understand if this happens locally or on a cloud service

AlexFeYang commented 1 month ago

I changed the logging, it seems running on cpu, @cubiq

AlexFeYang commented 1 month ago

I solved the problem by moving the toKV code to loader nodes, it works fine for now.@cubiq hope everything goes well! Thank you for your patient help.

cubiq commented 1 month ago

what did you do exactly?

AlexFeYang commented 1 month ago

I move the to KV code from here to IPAdapterUnifiedLoader and IPAdapterModelLoader these code is running on cpu, that's why my 4090 not working well, maybe your pc have a better cpu I test the original code on server from us to Europe, and the Europe 4090 run faster, that's because the cpu is better perhaps. I tested it worked well so far. There seemed to be no problem. You see if there is any problem? @cubiq maybe there are some other init job should be place outside the core nodes to upgrade the performance.

cubiq commented 1 month ago

it's just a matter of moving it to the gpu before performing the operation, what I'm wondering is if it should be done automatically or not. if one has little vram it might be better to keep it on the cpu

AlexFeYang commented 1 month ago

you are right, I still wish to solve the problem here, I still don't get why my local machine and cloud pod all running the TO_KV function on cpu, it's seems outside this function everywhere's device is correct on GPU.

AlexFeYang commented 1 month ago

it will be better if you tell me where to set the related models on GPU directly, just affect the to_KV function will be ok. I am a student lack of experience on Pytorch stuff

cubiq commented 1 month ago

I'll make some more tests on this, thanks for bringing it out

cubiq / ComfyUI_IPAdapter_plus

it takes IPA 4s to load a new image input #579