JackAILab / ConsistentID

Customized ID Consistent for human
MIT License
845 stars 76 forks source link

</image/> #48

Closed gaoyixuan111 closed 4 months ago

gaoyixuan111 commented 4 months ago

@JackAILab Why is the trigger word </image/> set in the face encoder? What is the purpose of </image/>? Is it necessary?

JackAILab commented 4 months ago

Hi, @gaoyixuan111, this is a preparation for the release of the multi-ID version later. The trigger word </image/> is to meet the needs of users to enter multiple images. You can refer to the idea of ​Photomaker.

When our paper is officially accepted, we will update the new version of the paper and release more features mentioned. If you have any questions, please feel free to ask or PR.

gaoyixuan111 commented 4 months ago

@JackAILab Why freeze the text cross-attention? Since image features are integrated into text embeddings, the original text cross-attention cannot recognize them. Why is training the facial encoder sufficient to solve this problem? Have you tried setting the text cross-attention to be trainable?

JackAILab commented 4 months ago

@gaoyixuan111 Hi, please refer to this issue45~