Thanks for you open source code. In paper, I guess you train speaker and face embedding network respectively. Then, use it to clean the data. But in speaker-face folder, you train speaker and face embedding network together. I want to ask why you do this. Thank you.
I just put them into the same code for training and evaluation, but two models are independent. The motivation is to check the multi-model speaker recognition results.
Thanks for you open source code. In paper, I guess you train speaker and face embedding network respectively. Then, use it to clean the data. But in speaker-face folder, you train speaker and face embedding network together. I want to ask why you do this. Thank you.