apache / mxnet

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
https://mxnet.apache.org
Apache License 2.0
20.77k stars 6.8k forks source link

gradients using loss=F.make_loss(myloss) are not same as gradients not using loss=F.make_loss(myloss) in Hybrid programming #19657

Closed Sundrops closed 3 years ago

Sundrops commented 3 years ago

This operator accepts a customized loss function symbol as a terminal loss and the symbol should be an operator with no backward dependency. The output of this function is the gradient of loss with respect to the input data.

The description of ndarray.make_loss is same as the description of symbol.makeloss. And it only explains symbol, not ndarray. I want to know what `F.make loss()will do when I usenet.hybridize()andloss.backward()`.

https://mxnet.apache.org/versions/1.7.0/api/python/docs/api/ndarray/ndarray.html?highlight=make_loss#mxnet.ndarray.make_loss https://mxnet.apache.org/versions/1.7.0/api/python/docs/api/symbol/symbol.html#mxnet.symbol.make_loss

class Myloss(mx.gluon.nn.HybridBlock):
    def __init__(self):
        super(Myloss, self).__init__()
    def hybrid_forward(self, F, pred, gt):
        loss_l2 = F.sum(F.square(pred - gt), axis=1) / 2
        return F.make_loss(loss_l2)
net=resnet()
net.hybridize()
x = net(input)
myloss = Myloss()
loss = myloss(x, y)
loss.backward()
github-actions[bot] commented 3 years ago

Welcome to Apache MXNet (incubating)! We are on a mission to democratize AI, and we are glad that you are contributing to it by opening this issue. Please make sure to include all the relevant context, and one of the @apache/mxnet-committers will be here shortly. If you are interested in contributing to our project, let us know! Also, be sure to check out our guide on contributing to MXNet and our development guides wiki.

szha commented 3 years ago

@Sundrops there's no need to use make_loss in Gluon as all values can be used as head gradients now.