Hi!
In the code, it seems that the ref latents are parallelly concatenated with the own latent and being denoised simultaneously ? May I know if it is equivalent to first invidually get the self-attention key and value of each ref invidually, and then concat them together during denoising ?
yes, you can get the self-attention key and value of each ref invidually first, and then concat them together during denoising, it will save your GPU memory
Hi! In the code, it seems that the ref latents are parallelly concatenated with the own latent and being denoised simultaneously ? May I know if it is equivalent to first invidually get the self-attention key and value of each ref invidually, and then concat them together during denoising ?