explosion / spaCy

💫 Industrial-strength Natural Language Processing (NLP) in Python
https://spacy.io
MIT License
30.21k stars 4.4k forks source link

Bug in TextCategorizer.get_loss #3795

Closed dxiao2003 closed 5 years ago

dxiao2003 commented 5 years ago

The following line is incorrect because d_scores = (scores - truths) / scores.shape[0] and so when it is squared you get an extra factor of scores.shape[0] in the denominator.

https://github.com/explosion/spaCy/blob/89379a7fa45f94bce4945284a7781eaaa7bc06ff/spacy/pipeline/pipes.pyx#L962

Suggest changing back to the previous version

mean_square_error = ((scores-truths)**2).sum(axis=1).mean()
honnibal commented 5 years ago

Well...The loss is just a metric, it doesn't change the optimisation. The question is whether it's more useful to report the loss per batch, or the mean loss per example. I think the latter is more useful, especially if the batch size is varying during training.

dxiao2003 commented 5 years ago

Hi, yes I agree with you on both points, that it does not affect optimization in the SGD step and the per-example loss if more useful.

However the bug still presents a problem because:

  1. Code from 2.0 that uses the loss to determine when to stop training early because loss is sufficiently small will now misbehave in 2,1 because the calculation for loss is different.

  2. The new code does not compute loss per example, it computes loss per example / batch size2 because there's a factor of ``scores.shape[0]2in the denominator from the definition ofd_scores``.

lock[bot] commented 5 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.