Issue of too much recompilation after modification of cross attention processor

Hello, i did bring some large change to the cross attention processor in order to support regional prompt. However after this change i do observe a lot of recompilation (1 over 10 renders). I investigated, this is link to a constant change of the number of inputs due to a change of the number of "regional prompt".

If i do not JIT compilation of the unet then i don't have this issue but i don't have good speed ups anymore.

This is very annoying. Do you have any idea on how to prevent this?

I was wondering if it is possible to do apply the lazy_trace only on the self attention layers and do not trace cross attention?

chengzeyi / stable-fast

Issue of too much recompilation after modification of cross attention processor #131