HAHA-DL / MLDG

The demo code for the MLDG paper "Learning to Generalize: Meta-Learning for Domain Generalization", https://arxiv.org/abs/1710.03463, https://arxiv.org/pdf/1710.03463.pdf
MIT License
139 stars 35 forks source link

About the first-order approximation #6

Open tht106 opened 3 years ago

tht106 commented 3 years ago

Hi, thank you for this fascinating work and providing a demo of MLDG.

Two quick questions:

1) Did you use the first-order approximation in the MLP version of MLDG. The codes in ops.py look like an operation of the first-order approximation.

`

    if not stop_gradient:
        grad_weight = autograd.grad(meta_loss, weight, create_graph=True)[0]

        if bias is not None:
            grad_bias = autograd.grad(meta_loss, bias, create_graph=True)[0]
            bias_adapt = bias - grad_bias * meta_step_size
        else:
            bias_adapt = bias

    else:
        grad_weight = Variable(autograd.grad(meta_loss, weight, create_graph=True)[0].data, requires_grad=False)

        if bias is not None:
            grad_bias = Variable(autograd.grad(meta_loss, bias, create_graph=True)[0].data, requires_grad=False)
            bias_adapt = bias - grad_bias * meta_step_size
        else:
            bias_adapt = bias

    return F.linear(inputs,
                    weight - grad_weight * meta_step_size,
                    bias_adapt)
else:
    return F.linear(inputs, weight, bias)`

2) I am also wondering the meaning of the parameter "--stop_gradient". What would happen when we set it ture?

CinKKKyo commented 1 year ago

Hi, thank you for this fascinating work and providing a demo of MLDG.

Two quick questions:

  1. Did you use the first-order approximation in the MLP version of MLDG. The codes in ops.py look like an operation of the first-order approximation.

`

    if not stop_gradient:
        grad_weight = autograd.grad(meta_loss, weight, create_graph=True)[0]

        if bias is not None:
            grad_bias = autograd.grad(meta_loss, bias, create_graph=True)[0]
            bias_adapt = bias - grad_bias * meta_step_size
        else:
            bias_adapt = bias

    else:
        grad_weight = Variable(autograd.grad(meta_loss, weight, create_graph=True)[0].data, requires_grad=False)

        if bias is not None:
            grad_bias = Variable(autograd.grad(meta_loss, bias, create_graph=True)[0].data, requires_grad=False)
            bias_adapt = bias - grad_bias * meta_step_size
        else:
            bias_adapt = bias

    return F.linear(inputs,
                    weight - grad_weight * meta_step_size,
                    bias_adapt)
else:
    return F.linear(inputs, weight, bias)`
  1. I am also wondering the meaning of the parameter "--stop_gradient". What would happen when we set it true?

The meaning of the parameter "--stop_gradient" also make me confused. Did you figure it out?

CharmsGraker commented 10 months ago

Hi, thank you for this fascinating work and providing a demo of MLDG. Two quick questions:

  1. Did you use the first-order approximation in the MLP version of MLDG. The codes in ops.py look like an operation of the first-order approximation.

`

    if not stop_gradient:
        grad_weight = autograd.grad(meta_loss, weight, create_graph=True)[0]

        if bias is not None:
            grad_bias = autograd.grad(meta_loss, bias, create_graph=True)[0]
            bias_adapt = bias - grad_bias * meta_step_size
        else:
            bias_adapt = bias

    else:
        grad_weight = Variable(autograd.grad(meta_loss, weight, create_graph=True)[0].data, requires_grad=False)

        if bias is not None:
            grad_bias = Variable(autograd.grad(meta_loss, bias, create_graph=True)[0].data, requires_grad=False)
            bias_adapt = bias - grad_bias * meta_step_size
        else:
            bias_adapt = bias

    return F.linear(inputs,
                    weight - grad_weight * meta_step_size,
                    bias_adapt)
else:
    return F.linear(inputs, weight, bias)`
  1. I am also wondering the meaning of the parameter "--stop_gradient". What would happen when we set it true?

The meaning of the parameter "--stop_gradient" also make me confused. Did you figure it out? setting stop_gradient=True is to avoid large budget when excuting meta-optimization. In my opinion, if stop_gradient=True, the whole algorithm could be reckoned as training objections of F(theta), G(theta) alternatively.