Joint Training - Githubissues

guangdongliang commented 5 years ago

Thank you for your source code and paper! As for the "Joint Training" ,in my opinion , it combines the loss of two dataset ,which will make the performance of combined model lower than big dataset and higher than the smaller dataset . Maybe the performance of model trained from unlabeled dataset is better than the combined one , so do you have the performance result of only unlabeled dataset ?

XiaohangZhan commented 5 years ago

Actually, the model trained with two datasets is better than the model trained using any one of them. It is easy to understand, because combining two datasets increases the total number of images. Hence, with more datasets, the performance is additive rather than eclectic.

guangdongliang commented 5 years ago

Actually, the model trained with two datasets is better than the model trained using any one of them. It is easy to understand, because combining two datasets increases the total number of images. Hence, with more datasets, the performance is additive rather than eclectic.

Thank you for your reply！ With more High-quality training dataset , the performance of new model will get promoted ! However , when i trained with ms1m+vgg2+combined softmax or other loss , the performance is not as good as model trained with only ms1m ! do you have any good idea of that? Thank you very much !

XiaohangZhan commented 5 years ago

Here are some tips:

Using multi-task learning. Do not directly merge two dataset together to create a larger dataset, since the identity overlapping between datasets is unknown. The multi-task architecture roughly looks like: backbone --> face id (e.g., 512 dim vector) --> split into two branches, each branch is a linear classifier followed by softmax loss. Images in dataset 1 go to branch 1, and images in dataset 2 go to branch 2, while they share the same backbone (except for BN, see 3.b).
Adjust the two loss weights and batch sizes.
There may be some domain gaps between different datasets. Here are some tips to resolve: a. The images is aligned within the same dataset but may not be aligned cross datasets. You need to use the same alignment method to re-align them. b. Using batch normalization respectively for each dataset, do not batch normalize images from different dataset.

guangdongliang commented 5 years ago

Here are some tips:

Using multi-task learning. Do not directly merge two dataset together to create a larger dataset, since the identity overlapping between datasets is unknown. The multi-task architecture roughly looks like: backbone --> face id (e.g., 512 dim vector) --> split into two branches, each branch is a linear classifier followed by softmax loss. Images in dataset 1 go to branch 1, and images in dataset 2 go to branch 2, while they share the same backbone (except for BN, see 3.b).

Adjust the two loss weights and batch sizes.

There may be some domain gaps between different datasets. Here are some tips to resolve: a. The images is aligned within the same dataset but may not be aligned cross datasets. You need to use the same alignment method to re-align them. b. Using batch normalization respectively for each dataset, do not batch normalize images from different dataset.

Thank you ! but if BN is not shared , which BN should i use when testing and inferencing?

XiaohangZhan commented 5 years ago

Sorry I was wrong. Just use the same BN, but do not put images from different datasets in a batch. It helps in my experiments, but I'm not sure in your circumstances.

guangdongliang commented 5 years ago

Sorry I was wrong. Just use the same BN, but do not put images from different datasets in a batch. It helps in my experiments, but I'm not sure in your circumstances.

As my model trained with multi-task learning was not as good as model trained with separate dataset , I wonder if you have the performance of different loss weights or model do share BN ? I am disappointed at the bad performance of my experiment result.

XiaohangZhan commented 5 years ago

Solving domain gap is out of the scope of CDP, so I cannot provide the results.

guangdongliang commented 5 years ago

Thank you for your replay! The CDP could improve the quality of the rest of the MS-Celeb-1M in the paper , However , why did you get 78.18% on Megaface , which was lower than 78.52% when all labels were employed .

XiaohangZhan commented 5 years ago

CDP can indeed pinpoint wrongly annotated samples and low-quality samples, then regard them as noises and discard them. However, some of them may not be noises but hard examples, e.g., occluded faces, atypical faces, see Figure 10 (failure cases) in the arxiv paper. Hence, it's hard to say CDP does better than manual annotation. The main contribution of CDP is leveraging unlabeled data and boosting the performance as close as fully supervised ones.

guangdongliang commented 5 years ago

CDP can indeed pinpoint wrongly annotated samples and low-quality samples, then regard them as noises and discard them. However, some of them may not be noises but hard examples, e.g., occluded faces, atypical faces, see Figure 10 (failure cases) in the arxiv paper. Hence, it's hard to say CDP does better than manual annotation. The main contribution of CDP is leveraging unlabeled data and boosting the performance as close as fully supervised ones.

Thank for your patience！ I agree with you , but in my opinion , the performance may be cut down by hard examples and multi-task learning itself . I will do some experiment about that.

XiaohangZhan / cdp

Joint Training #5