Open avisekiit opened 6 years ago
@avisekiit Yes, I got the same understanding and question as yours. May this be a trick in this paper?
The comparison of the last layers is only done in aggregate -- usually with batch sizes over 100 for each of the source and target. It's more of a comparison of the distribution shape, rather than a comparison between the similarity of individual images. Thus, the target (and source) labels are not used during ADDA training.
@jhoffman Thanks for your reply. I just have a thought: why do you think that it is better to adapt the ultimate last layer which mainly outputs classification logits instead of some intermediate layers which can capture high level features. At least from the diagram of your flow diagram of your paper I got a feel that source encoder and classifier were 2 separate modular components and you adapt the features extracted by the encoder section of both target and source.. Thanks again for your time....
Hi, Please do correct me if I am wrong. From the code, it seems that we will adapt the extreme last(10-way classification output) fully connected layer. However, I think that this layer is obviously giving the distribution of class probabilities and we cannot adapt this layer if we don't know the label of the target dataset during ADDA. For example, if we feed a digit 5 from source domain and a digit 7 from the target domain, obviously the distributions of the last layer will be different. It only makes sense to align the last layer if we are sure that we feed the images of same class; which in turn means that we have to know the class labels of target domain during the ADDA adaptation phase. Is my understanding wrong ?