easezyc / deep-transfer-learning

A collection of implementations of deep domain adaptation algorithms
MIT License
889 stars 205 forks source link

Question about RevGrad #17

Closed deepai-lab closed 3 years ago

deepai-lab commented 3 years ago

Hi @easezyc ,

You provided both version of the implementation of RevGrad using Pytorch 0.3 and Pytorch 1.0.

In Pytorch 0.3 the code is like that `class RevGrad(nn.Module):

def __init__(self, num_classes=31):
    super(RevGrad, self).__init__()
    self.sharedNet = resnet50(False)
    self.cls_fc = nn.Linear(2048, num_classes)
    self.domain_fc = nn.Linear(2048, 2)

def forward(self, data):
    data = self.sharedNet(data)
    clabel_pred = self.cls_fc(data)
    dlabel_pred = self.domain_fc(data)

    return clabel_pred, dlabel_pred`

and in Pytorch 1.0 the code is like that:

class RevGrad(nn.Module):

    def __init__(self, num_classes=31):
        super(RevGrad, self).__init__()
        self.sharedNet = resnet50(True)
        self.cls_fn = nn.Linear(2048, num_classes)
        self.domain_fn = AdversarialNetwork(in_feature=2048)

    def forward(self, data):
        data = self.sharedNet(data)
        clabel_pred = self.cls_fn(data)
        dlabel_pred = self.domain_fn(AdversarialLayer(high_value=1.0)(data))
        #print(dlabel_pred)
        return clabel_pred, dlabel_pred

class AdversarialNetwork(nn.Module):
    def __init__(self, in_feature):
        super(AdversarialNetwork, self).__init__()
        self.ad_layer1 = nn.Linear(in_feature,1024)
        self.ad_layer2 = nn.Linear(1024,1024)
        self.ad_layer3 = nn.Linear(1024, 1)
        self.ad_layer1.weight.data.normal_(0, 0.01)
        self.ad_layer2.weight.data.normal_(0, 0.01)
        self.ad_layer3.weight.data.normal_(0, 0.3)
        self.ad_layer1.bias.data.fill_(0.0)
        self.ad_layer2.bias.data.fill_(0.0)
        self.ad_layer3.bias.data.fill_(0.0)
        self.relu1 = nn.ReLU()
        self.relu2 = nn.ReLU()
        self.dropout1 = nn.Dropout(0.5)
        self.dropout2 = nn.Dropout(0.5)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        x = self.ad_layer1(x)
        x = self.relu1(x)
        x = self.dropout1(x)
        x = self.ad_layer2(x)
        x = self.relu2(x)
        x = self.dropout2(x)
        x = self.ad_layer3(x)
        x = self.sigmoid(x)
        return x

    def output_num(self):
        return 1
class AdversarialLayer(torch.autograd.Function):
  def __init__(self, high_value=1.0):
    self.iter_num = 0
    self.alpha = 10
    self.low = 0.0
    self.high = high_value
    self.max_iter = 2000.0

  def forward(self, input):
    self.iter_num += 1
    output = input * 1.0
    return output

  def backward(self, gradOutput):
    self.coeff = np.float(2.0 * (self.high - self.low) / (1.0 + np.exp(-self.alpha*self.iter_num / self.max_iter)) - (self.high - self.low) + self.low)
    return -self.coeff * gradOutput

My question is which method is correct? If both methods are correct, can you please explain a bit. Thanks in advance.

easezyc commented 3 years ago

Actually, both is right. Pytorch0.3 is implemented as 'Simultaneous Deep Transfer Across Domains and Tasks' ICCV2015. Pytorch1.0 is implemented as real Revgrad.

deepai-lab commented 3 years ago

Hi @easezyc,

Thank you for your answer. If we have 3 domains, how can we use Revgrad? We would like to minimize the distance among the three source domains. Can you please help me? Thanks in advance.

class RevGrad(nn.Module):

    def __init__(self, num_classes=31):
        super(RevGrad, self).__init__()
        self.sharedNet = resnet50(False)
        self.cls_fc = nn.Linear(2048, num_classes)
        self.domain_fc = nn.Linear(2048, 3)

    def forward(self, data):
        data = self.sharedNet(data)
        clabel_pred = self.cls_fc(data)
        dlabel_pred = self.domain_fc(data)

        return clabel_pred, dlabel_pred
def train(model):
    src1_data_iter = iter(src1_loader)
    src2_data_iter = iter(src1_loader)
    src3_data_iter = iter(src1_loader)

    src1_dlabel = Variable(torch.ones(batch_size).long().cuda())
    src2_dlabel = 2 * Variable(torch.ones(batch_size).long().cuda())
    src3_dlabel = Variable(torch.zeros(batch_size).long().cuda())
......................................
......................................

I am confused with this part:

new_label_pred=torch.cat((src_dlabel_pred,tgt_dlabel_pred),0)
confusion_loss=nn.BCELoss()    

confusion_loss_total=confusion_loss(new_label_pred,torch.cat((src_dlabel,tgt_dlabel),0).float().reshape(2*batch_size,1))

How can we do that?

easezyc commented 3 years ago

I did not try adversarial training for more than two domains. I think you can refer to some other references, e.g., Task-Adversarial Co-Generative Nets.