I believe the first sentence of this is not worded correctly:
Note that because we have a one-hot-encoded dependent variable, we can't directly use nll_loss or softmax (and therefore we can't use cross_entropy):
softmax, as we saw, requires that all predictions sum to 1, and tends to push one activation to be much larger than the others (due to the use of exp); however, we may well have multiple objects that we're confident appear in an image, so restricting the maximum sum of activations to 1 is not a good idea. By the same reasoning, we may want the sum to be less than 1, if we don't think any of the categories appear in an image.
nll_loss, as we saw, returns the value of just one activation: the single activation corresponding with the single label for an item. This doesn't make sense when we have multiple labels.
One-hot encoding does not disallow the use of softmax. The reason should be that the objective is multi-label, right? As the bullet points explain.
I believe the first sentence of this is not worded correctly:
One-hot encoding does not disallow the use of softmax. The reason should be that the objective is multi-label, right? As the bullet points explain.