Why text cross attention freezing?

JackAILab / ConsistentID

Customized ID Consistent for human

MIT License

846 stars 76 forks source link

Why text cross attention freezing? #45

Closed sherrydoge closed 4 months ago

sherrydoge commented 5 months ago

Nice work! I tried your demo and got fancy results, but I'm confused about why the text cross-attention can be freezing. Since you fuse image features into the text embedding, the original text cross-attention cannot recognize them anymore. I wonder why training the face encoder is enough to figure this out, and if you have tried to set the text cross-attention trainable?

JackAILab commented 4 months ago

Hi, @sherrydoge Thank you for your careful observation. Cross-attention is an important parameter. In fact, our model partially unfreezes the cross-attention during training. You can refer to the discussion process of issue41.

The current version of the paper still has some flaws. We plan to update the new version of the paper and release more extended functions after the paper is officially accepted. If you have any ideas or questions, please feel free to PR and discuss~