I am writing to ask why you would use reference to keep the identity feature. But I think if you directly integrate the face embedding to the denoising net by cross attention, it should also work.
May I ask why you use reference net? I am guessing it might follow EMO. But is there any reason from principle?
Dear Sir or Madam,
I am writing to ask why you would use reference to keep the identity feature. But I think if you directly integrate the
face embedding
to the denoising net by cross attention, it should also work.May I ask why you use reference net? I am guessing it might follow EMO. But is there any reason from principle?
Looking for your reply. Best~