Questions about Domain Predictor

zw615 commented 4 years ago

Hi! I've read the coder and find the idea very interesting. However, I find the _domainpredictor part a bit confusing.

It seems that the domainSourceLabels and domainTargetLabels assignment in feval_MuLANN is exactly the same as DANN (0 for target and 1 for source). However, according to the paper, if I'm not mistaken, the domain discrimination loss aligns per class distribution like MADA, not DANN?

Moreover, how does MuLANN scales to more than 2 domains? I have noticed that the domain_predictor in large_domain_predict_model has a multidomain option (3 domain), but I cannot find any related training code in train_all.lua or train_asym.lua.

By the way, I'm not familiar with LUA/Torch, but I'm much more familiar with Python/PyTorch. So maybe I'm missing something here. I wonder if you would be so kind to translate the code to PyTorch.

Thanks a lot!

aschoenauer-sebag commented 3 years ago

Hi,

Thanks for your questions :

It seems that the domainSourceLabels and domainTargetLabels assignment in feval_MuLANN is exactly the same as DANN (0 for target and 1 for source). However, according to the paper, if I'm not mistaken, the domain discrimination loss aligns per class distribution like MADA, not DANN?

MuLANN has the same domain discriminator as DANN, a single global one (rather than one per class as MADA).

how does MuLANN scales to more than 2 domains?

In that case, the domain discriminator has a softmax as last activation function, and the criterion is the multiclass cross-entropy criterion.

I wonder if you would be so kind to translate the code to PyTorch.

This is what I'm currently doing :+1: :) I'll let you know when MuLANN is up and running in PyTorch (within a few days max I think).

Cheers!

zw615 commented 3 years ago

Thanks!

That's where I'm confused. So if I am not mistaken, the domain discriminator has an output of length num_domains * num_classes? Otherwise I cann't think of other possibilities to align the multi-domain multi-class distributions. And in backward propagation, the gradiant passed to the backbone is reversed just like DANN?

aschoenauer-sebag commented 3 years ago

Hi, thank you for your interest for our work.

The domain discriminator has an output of length num_domains. This is as in the DANN paper. I think the idea is that one does not need to force the alignement between the distributions on a class per class basis, because there are 2 "pressures" on the latent feature space:

the first pressure is that from the classification loss. This will tend to group together data points from the same classes, in the latent feature space. Of course since we are in a semi-supervised setting for the target domain(s), this will not have an impact on the target classes for which the images are not labelled;
the second pressure is that from the domain discrimination loss. This will tend to group together data points from the n distinct domain distributions.

The idea is that since these two pressures act to learn the right latent feature space, you will still have some information about classes in this space, coming from the classification loss. Thus, you can spare yourself having a per-class domain discriminator. Does this make it clearer?

In backward propagation, the gradient from the domain discrimination is indeed reversed as in DANN: we would like to somehow "unlearn" how to discriminate the n domains.

AltschulerWu-Lab / MuLANN

Questions about Domain Predictor #1