Closed jezza770 closed 5 years ago
Hello @jezza770. I reproduced the error and traced the issue up to ONNX export. I opened related PyTorch issue. So, for now the only way is to set proper momentum, as you do. Will wait for a PyTorch-ONNX issue to be solved.
Hello @jezza770, I'm back with updates. It was ONNX-issue now fixed in master branch of PyTorch. Will wait for a new PyTorch release to get it work from the box.
The parameters for Pytorch's nn.BatchNorm2d do not copy correctly. Pytorch's default momentum is 0.1 however after training, the Keras model has momentum of 1. This causes nan's in the network after training. See the example below. It's not exactly minimum (copied in part from a personal project I'm working on) but it demonstrates the issue. Manually changing the momentum before compiling fixes the issue. Other parameters such as epsilon seem to be copied correctly.