Did you try training this model with "glint360k_cosface_r100_fp16_0.1" ID encoder?

Thank for your interests.

In fact, I have tried, but the training results were not as good as those of ArcFace, possibly due to hyperparameter issues.
I'm not quite sure which one you're referring to. If you mean proxy pair data generation, you can refer to the file directory structure in proxy_data, where for each image y_proxy, source and target driven image are sampled randomly, and the source_proxy and target_proxy are generated by corresponding driven image and surrogate model. Addtionally, face-shape may be modified with the code in https://github.com/jankrepl/pychubby. The release of the generation code may take more time, you may try to reproduce it for now.
If you need my generation code, I can provide it later.
I have trained 33 epochs with batch size=8. In my experiments, training longer did not yield better results. This observation is based on empirical evidence and likely draws reference from training configurations used in methods like FaceShifter and SimSwap (I'm not sure for too long).
I have already provided the trained checkpoint and the data used for training. Which specific checkpoint are you referring to here?

ICTMCG / CSCS