Closed marcociccone closed 4 years ago
Hi @xcpeng , I'm not sure you'd want to merge my PR into your master. I'm still debugging it. I've found a couple of bugs, for sure the Mutual Information minimization was wrong. Do you have time to check the correctness of the implementation with me?
Few more details: I believe the code was minimizing the MI between (di, ci) and (ds, ci) instead of (di, ci) and (di, ds) as described in the paper. Fixing this here I'm not having the nan loss anymore, but the training is still very unstable.
I've integrated tensorboard so you can easily visualize the loss functions. Let me know what you think!
@marcociccone Thank you very much! I don't think that paper is a qualified work since the author cannot explain the results and the misleading code. But I would like to appreciate for your efforts.
Hi, first of all congrats for your work and thanks for sharing the code :) I've read the paper and checked the code and I found few inconsistencies, so I thought it would be good to refactor your codebase to have a better understanding of all the pieces. I am going to ask you a few questions that I hope will help me and other researchers in building upon your great work.
I've used the DigitFive dataset you provided. From my understanding, the code requires to resize the images from
svhn
and andsyn
domains with theresize_from32x32to28x28.m
script. Unfortunately, I couldn't find the test and train split for thesyn
domain in the zip file, thus I wasn't able to generate the data. I could use your help here. To be more specific, I'm referring to filessynth_train28x28.mat
,synth_test28x28.mat
.There was some confusion about the role of
C_0
,C_1
, andC_2
in the code. Indeed, the paper only mentions two classifiers. I still don't catch the need of the three classifiers, could you please clarify this point.Reading the paper it seems that the enumeration has the following meaning attached. Could you confirm this?
C_0, D_0
--> domain-specific (ds)C_1, D_1
--> domain-independent (di)C_2, D_2
--> class-independent (ci)Also, I can't find any mention of the discrepancy losses in the paper. Could you clarify this point too?
Apart from implementation questions, I obtain nan loss function pretty soon. I will play around with the hyper-parameters. I had to remove the
syn
dataset from now, maybe the hyper-params you used need to be adjusted to take into account this change.Thank you for your help!