Does the checkpoint on huggingface use decoupled cross attention?

JackAILab / ConsistentID

Customized ID Consistent for human

MIT License

845 stars 76 forks source link

Does the checkpoint on huggingface use decoupled cross attention? #41

Closed ngsitrong26 closed 5 months ago

ngsitrong26 commented 5 months ago

if cross_attention_dim is None: attn_procs[name] = Consistent_AttProcessor( hidden_size=hidden_size, cross_attention_dim=cross_attention_dim, rank=self.lora_rank, ).to(self.device, dtype=self.torch_dtype) else: attn_procs[name] = Consistent_IPAttProcessor( hidden_size=hidden_size, cross_attention_dim=cross_attention_dim, scale=1.0, rank=self.lora_rank, num_tokens=self.num_tokens, ).to(self.device, dtype=self.torch_dtype)

vuongminh1907 commented 5 months ago

@JackAILab I am also concerned about the issue. You have text embedding and image embedding. Do you use cross-attention or decouple ?

gaoyixuan111 commented 5 months ago

First, the hidden states after self-attention are used with the text for cross-attention. Then, the same hidden states are used with image embeds for cross-attention calculation. Finally, the results of the two attentions are added together, so there are two decoupled cross-attention operations.

vuongminh1907 commented 5 months ago

So may be the same with ip-adapter, right ?

JackAILab commented 5 months ago

hi, @vuongminh1907 , yes, ConsistentID uses the same attention decoupling structure as IPAdapter.

@ngsitrong26 , the inference demo also uses decoupled cross attention. Consistent_AttProcessor defines the attention score of multimodal text, that's important, the attention score matrix of multimodal text needs to be returned from Consistent_AttProcessor. attention_L157.

gaoyixuan111 commented 5 months ago

@JackAILab If I'm not using the ID-Preservation network functionality, can I directly use the Fine-grained Multimodal text prompts in the IP-Adapter model without redefining the attention mechanism in it?

ngsitrong26 commented 5 months ago

@JackAILab i went deeper into the source code and found that only Consistent_IPAttProcessor handles decouple cross attention, while Consistent_AttProcessor does not. So, do you use Consistent_IPAttProcessor or Consistent_AttProcessor? I also see in the checkpoint the weight of the decouple

gaoyixuan111 commented 5 months ago

@JackAILab In class Consistent_AttProcessor, the LoRA parameters are updated. However, in class Consistent_IPAttProcessor, the choice is to not update the LoRA weights. Could you explain the reason and purpose behind this decision? for module in [self.to_q_lora, self.to_k_lora, self.to_v_lora, self.to_out_lora, self.to_k_ip, self.to_v_ip]: for param in module.parameters(): param.requires_grad = False

JackAILab commented 5 months ago

@ngsitrong26 hi, sorry for the late reply, I've been busy with some other projects recently.

The attention of both Consistent_AttProcessor and Consistent_IPAttProcessor is used. You can check the model weights through code convert_weights.py. The weights starting with odd numbers are based on the weights obtained by Consistent_AttProcessor, and the weights starting with even numbers are based on the weights obtained by Consistent_IPAttProcessor.

This is because it is set during the training process, refer to train.py. If you have any questions and ideas, please feel free to raise them and PR.

JackAILab commented 5 months ago

@gaoyixuan111 Great observation, this was just for debugging purposes. We have updated the attention.py.