Hi @cubiq, Recently, I noticed that StoryDiffusion https://github.com/HVision-NKU/StoryDiffusion/tree/main : Consistent Self-Attention for Long-Range Image and Video Generation, and the attention processor is based on IP-Adapater, maybe you can make a native version for it or Integrate into ipadapter.
################################################# ########Consistent Self-Attention################ ################################################# class SpatialAttnProcessor2_0(torch.nn.Module): r""" Attention processor for IP-Adapater for PyTorch 2.0. Args: hidden_size (int): The hidden size of the attention layer. cross_attention_dim (int): The number of channels in theencoder_hidden_states. text_context_len (int, defaults to 77): The context length of the text features. scale (float, defaults to 1.0): the weight scale of image prompt. """
Hi @cubiq, Recently, I noticed that StoryDiffusion https://github.com/HVision-NKU/StoryDiffusion/tree/main : Consistent Self-Attention for Long-Range Image and Video Generation, and the attention processor is based on IP-Adapater, maybe you can make a native version for it or Integrate into ipadapter.
################################################# ########Consistent Self-Attention################ ################################################# class SpatialAttnProcessor2_0(torch.nn.Module): r""" Attention processor for IP-Adapater for PyTorch 2.0. Args: hidden_size (
int): The hidden size of the attention layer. cross_attention_dim (
int): The number of channels in the
encoder_hidden_states. text_context_len (
int, defaults to 77): The context length of the text features. scale (
float, defaults to 1.0): the weight scale of image prompt. """