apache / mxnet

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
https://mxnet.apache.org
Apache License 2.0
20.78k stars 6.79k forks source link

Can I set instance weight when training? #7375

Open regzhuce opened 7 years ago

regzhuce commented 7 years ago

Is there any way that I can set a weight for every instance when I train the model? I just cannot find any doc about this.

jeremiedb commented 7 years ago

The strategy I've used is to build a custom loss function using the MakeLoss operator and feeding it with the weights, For example: loss = MakeLoss(weight * mx.symbol.square(label-pred))

regzhuce commented 7 years ago

It's so strange when predicting, I have to feed a constant weight to the model, and get a loss value but the prediction value.

jeremiedb commented 7 years ago

The default behavior when using predict function on a model with MakeLoss is to get the inference on the latest layer, which is the one where the loss function is defined. You can either retro-fit the the actual predictions knowing the labels and weight, or more simply, get the predictions for the layer previous to MakeLoss where the preds are defined.

regzhuce commented 7 years ago

It's so trivial. Hopefully we can have more graceful approach encapuslated. That would be very nice.

thirdwing commented 7 years ago

@regzhuce The output of Makeloss is the gradient. See https://github.com/apache/incubator-mxnet/blob/master/src/operator/make_loss.cc#L35

regzhuce commented 7 years ago

@thirdwing Thanks Any proposals for my problem?

thirdwing commented 7 years ago

Can you give more details on what you mean by "set instance weight"?

I am sorry that I don't understand your problem.

On 10 Aug 2017 8:10 p.m., "reg.zhuce" notifications@github.com wrote:

@thirdwing https://github.com/thirdwing Thanks Any proposals for my problem?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/apache/incubator-mxnet/issues/7375#issuecomment-321723588, or mute the thread https://github.com/notifications/unsubscribe-auth/ABebVczWwmvWZBZDoSqE8DBQs5w3Q6vzks5sW8YagaJpZM4OwL1A .

regzhuce commented 7 years ago

Say, I got lots of samples, but not all samples are the same importance. I wanna to give every sample an importance, i.e. instance weight.

VGalata commented 6 years ago

I am also interested in how I can weight the samples. I have a binary classification problem and I wanted to give the samples or classes different weights. Is there no other way than using a custom loss function?

Unfortunately, the tutorial for the custom loss function is not sufficient to see how to use this function in a different setup. Is there a way to fully replace mx.symbol.<...>Output without the need of additional steps afterwards to get the prediction? I would like to get model performance during training on a validation data set. Thus, I need the predictions during training and I do not know how to get them if I use MakeLoss.

Any help is highly appreciated!

VGalata commented 6 years ago

@regzhuce : Probably this could help you:

I finally could figure out how to use class weights for a (binary) classification problem though I still do not know how to achieve the functionality of a mx.symbol.<...>Output layer to return the loss gradient and the prediction. However, here is my code to use a weighted version of cross-entropy when having two classes:

# ... other layers, last layer's name is 'last_layer'
# Fully connected layer with 2 nodes
fc_last <- mx.symbol.FullyConnected(data=last_layer, num_hidden=2, name='lastfullyconnected')
# Label variable
label   <- mx.symbol.Variable(name='label')
# Softmax
softmax <- mx.symbol.softmax(data=fc_last, name='softmax', axis=1)
# Weighted cross-entropy
# label_weight in (0, 1), 1e-6 is added to avoid log(0)
nn_out  <- mx.symbol.MakeLoss(
    -1 * (1 - label_weight) * (1 - label)  * mx.symbol.log(mx.symbol.Reshape(mx.symbol.slice_axis(softmax, axis=1, begin=0, end=1), shape = 0) + 1e-6) -
              label_weight  *      label   * mx.symbol.log(mx.symbol.Reshape(mx.symbol.slice_axis(softmax, axis=1, begin=1, end=2), shape = 0) + 1e-6),
            name='weightedcrossentropy'
        )

After training, the same approach can be used to obtain predictions as described in this example for a regression task.

@thirdwing: It would be nice to get a confirmation whether this is a valid example for using class weights on softmax output as, unfortunately, there is no tutorial for this case.

piyushghai commented 6 years ago

@regzhuce Hope your question was answered by the above comment.

@sandeep-krishnamurthy Can you please close this issue ?

zeakey commented 4 years ago

I face the same problem. I think the situation @regzhuce mentioned can be abstrated as: mannually assign weights to the loss of different samples.

In the mxnet.sym API about the SoftmaxOutput http://beta.mxnet.io/r/api/mx.symbol.SoftmaxOutput.html, I cannot find a proper solution.

I have to implement this idea in the symbolic API.

bricksdont commented 4 years ago

Same problem here. I am looking for a drop-in replacement for mx.sym.SoftmaxOutput that somehow allows weighting examples in a batch individually. Something like

 mx.sym.WeightedSoftmaxOutput(data=logits,
            label=labels,
            weights=weights,
            ignore_label=ignore_label,
            use_ignore=True,
            normalization=normalization,
            smooth_alpha=smooth_alpha,
            name=name)

@thirdwing Why did you tag this issue with R?

@piyushghai the example given by VGalata is not exactly what the issue is about, namely instance weights instead of class weights.

bricksdont commented 4 years ago

Here is a gist with an actual implementation of batch-weighted cross-entropy loss that I believe can replace the default SoftmaxOutput, but will be less efficient, for instance if label smoothing is used:

https://gist.github.com/bricksdont/812b4d6a21ab045da771560ec9af8c11