Hello,
i did bring some large change to the cross attention processor in order to support regional prompt. However after this change i do observe a lot of recompilation (1 over 10 renders). I investigated, this is link to a constant change of the number of inputs due to a change of the number of "regional prompt".
If i do not JIT compilation of the unet then i don't have this issue but i don't have good speed ups anymore.
This is very annoying. Do you have any idea on how to prevent this?
I was wondering if it is possible to do apply the lazy_trace only on the self attention layers and do not trace cross attention?
Hello, i did bring some large change to the cross attention processor in order to support regional prompt. However after this change i do observe a lot of recompilation (1 over 10 renders). I investigated, this is link to a constant change of the number of inputs due to a change of the number of "regional prompt".
If i do not JIT compilation of the unet then i don't have this issue but i don't have good speed ups anymore.
This is very annoying. Do you have any idea on how to prevent this?
I was wondering if it is possible to do apply the lazy_trace only on the self attention layers and do not trace cross attention?