chengzeyi / stable-fast

Best inference performance optimization framework for HuggingFace Diffusers on NVIDIA GPUs.
MIT License
1.05k stars 59 forks source link

Issue of too much recompilation after modification of cross attention processor #131

Open xziayro opened 4 months ago

xziayro commented 4 months ago

Hello, i did bring some large change to the cross attention processor in order to support regional prompt. However after this change i do observe a lot of recompilation (1 over 10 renders). I investigated, this is link to a constant change of the number of inputs due to a change of the number of "regional prompt".

If i do not JIT compilation of the unet then i don't have this issue but i don't have good speed ups anymore.

This is very annoying. Do you have any idea on how to prevent this?

I was wondering if it is possible to do apply the lazy_trace only on the self attention layers and do not trace cross attention?

chengzeyi commented 1 month ago

@xziayro You can use a tensor to contain your regional prompt. That should not trigger recompilation.