Closed edufierro closed 5 years ago
Hello @edufierro,
I have been trying to reproduce your error to no avail. Can you try to reproduce this in your system and let me know If it happens again?
This error is happening in the RMSE metric code:
class RMSEMetric(Metric):
def __init__(self, **kwargs):
super().__init__(metric_name='RMSE', **kwargs)
def update(self, batch, model_out, **kwargs):
predictions = self.get_predictions_flat(model_out, batch)
target = self.get_target_flat(batch)
self.squared_error += ((predictions - target) ** 2).sum().item()
self.tokens += self.get_tokens(batch)
def summarize(self):
rmse = math.sqrt(self.squared_error / self.tokens)
summary = {self.metric_name: rmse}
return self._prefix_keys(summary)
For the rmse to be infinite the division self.squared_error / self.tokens
needs to be infinite. Since self.tokens = get_tokens
can never be a decimal number, this must be caused by some kind of numerical errors when calculating the squared error.
Let me know if you can reproduce this error so we can get to the bottom of it!
Will do. Thanks @captainvera & team!
Hey @edufierro, since we're not able to reproduce this issue I'll be closing it for now. If you get this error again or find any way to reproduce it, please feel free to re-open this!
Describe the bug
Hi!
I'm training a estimator and I'm getting the following bug:
This comes after a warning:
It happened during training, after 16 epochs and after 500 batches of epoch 16.
To Reproduce Both yaml files and data are not public. Let me know and I can share them with you.
Expected behavior Finishes training for the 20 epochs I specified.
Environment (please complete the following information):