Closed BigBorg closed 4 years ago
Hello @BigBorg!
Can you provide a reproducible example using public data? Ideally, a config and a small dataset that encountered this error would be amazing.
I would love to help, but it's hard to diagnose if I can't reproduce the issue :)
Sorry the dataset is private. I might try some public dataset to see if this happens again.
If you share the full stack trace and the config file you used, we might also be able to help.
Sending codes out from the company i work for is restricted. Turning off sentence-ll solves the problem. Is it possible that the error is 0 then becomes inf after log? Besides, sentence scores produced by the model might be larger than 1, how do I interpret score?
Understandable. It could indeed be the case, but it seems weird that we never encountered this error while training with our own data or with publicly available datasets... If the reason becomes clearer, please let us know.
On the second question, sentence scores are an attempt to predict TER (Translation Error Rate), or the distance that separates the current translation from a "perfect" translation. With 1 meaning the whole sentence needs to be changed and 0 meaning the sentence is correct.
The model shouldn't produce scores above 1, what kind of scores are you seeing? Are you sure your training data contains all TER values in the range [0-1]?
Thanks for reminding me to inspect training data. It does contain hter larger than 1. I don't know why tercom is producing such result. I might try python package pyter to generate hter.
Tercom computes hter as
(Edit Distance mt - pe ) / (len(pe))
Thus if the MT is longer than the postedition, you can have an hter longer than 1 (this will typically be a case of MT repetitions / hallucinations). In the QE shared task, the scores are truncated to be at most 1.
The sentence-scores output by the model can be greater than 1 if you turn off sentence-ll.
As you can see in the code, the sentence score prediction module does not have a squashing function in the last layer.
If you enable sentence-ll
, the model outputs a gaussian distribution that is truncated over the interval [0, 1]. In that case, model scores are the mean of that distribution, which will always lie within the interval itself.
I am training estimator on an en-zh dataset. At first everything runs well. But after epoch 8, it says "array must not contain infs or NaNs" and exists. I don't know why this happen.