(This is a Gym problem, but the fix is in this repo, so i am reporting it here.)
kld and cat loss can both retun NaN values.
I have confirmed that this occurs for the 4 class NOAA dataset
I have confirmed that it appears with and without the new custom metrics (dice and iou)
The error dissapears when i go back to full precision (i.e., comment out the mixed precision policy).
My believe that this is a numerical stability issue with mixed precision, and it based on the softmax activation on the last layer..
(This is a Gym problem, but the fix is in this repo, so i am reporting it here.)
kld and cat loss can both retun NaN values. I have confirmed that this occurs for the 4 class NOAA dataset I have confirmed that it appears with and without the new custom metrics (dice and iou)
The error dissapears when i go back to full precision (i.e., comment out the mixed precision policy).
My believe that this is a numerical stability issue with mixed precision, and it based on the softmax activation on the last layer..
https://wandb.ai/site/articles/mixed-precision-training-with-tf-keras
The fix is to set softmax to floats32, which overrides the global fp16 policy.