allanj / pytorch_neural_crf

Pytorch implementation of LSTM/BERT-CRF for named entity recognition
359 stars 62 forks source link

The output for CRF loss #4

Closed mxc19912008 closed 5 years ago

mxc19912008 commented 5 years ago

Hi Allan,

Can I ask what is returned in the CRF forward function?(loss) here:

https://github.com/allanj/pytorch_lstmcrf/blob/f7ef24ae16a1a015df26ca1b8bdecada087afa43/model/neuralcrf.py#L46-L57

Is it nll loss? But it seems it is quite large and not like nll loss.

Thanks again!

allanj commented 5 years ago

Yes. For CRF, it is a negative log-likelihood. Mathematically, the probability p(y|x) is:

image

If we take the negative loglikelihood, it becomes: image

So the left-hand side is unlabeled_score, the right-hand side is labeled_score. And we make a subtraction.

You can also take an average (mean) to make the loss smaller but I guess it doesn't change a lot the performance.

mxc19912008 commented 5 years ago

I see, thanks again! :)

mxc19912008 commented 5 years ago

Hi Allan,

I have one more question if you don't mind. If I change the loss like this:

p = torch.exp(labeled_score - unlabed_score)
return -torch.log(p)

it should work because unlabed_score - labeled_score is the nll loss, then probability should be expressed as above, and then I do -torch.log, it should generate the same nll loss, but instead it generates "nan" at each epoch.

Thanks, Allan!

allanj commented 5 years ago

Because initially, the labeled score (in log space) could be something like -180 unlabeled score could be: 557

(I debugged inside and check the value).

In that case, when you take the exp operation, it would be really close to zero, probably exceed the floating-point boundary in python/torch. It's like torch.exp(-1000) = 0, and torch.exp(-500) still = 0.

Thus, log(0) gives you nan. That's why we prefer to work on log space directly

mxc19912008 commented 5 years ago

Thanks Allan! I wanted to add something to the softmax, before log. I'll do more experiments :)