Doodleverse / doodleverse_utils

A set of common Doodleverse tools and utilities
MIT License
4 stars 3 forks source link

kld and cat loss can sometime return NaN #11

Closed ebgoldstein closed 2 years ago

ebgoldstein commented 2 years ago

(This is a Gym problem, but the fix is in this repo, so i am reporting it here.)

kld and cat loss can both retun NaN values. I have confirmed that this occurs for the 4 class NOAA dataset I have confirmed that it appears with and without the new custom metrics (dice and iou)

The error dissapears when i go back to full precision (i.e., comment out the mixed precision policy).

My believe that this is a numerical stability issue with mixed precision, and it based on the softmax activation on the last layer..

https://wandb.ai/site/articles/mixed-precision-training-with-tf-keras

The fix is to set softmax to floats32, which overrides the global fp16 policy.