Infinite loss value when training under amp

facebookresearch / ConvNeXt

Code release for ConvNeXt model

MIT License

5.71k stars 692 forks source link

Infinite loss value when training under amp #131

Open jameslahm opened 1 year ago

jameslahm commented 1 year ago

Hi, I encounter the infinite loss value assertion failure when training using mixed precision. The trackback like this:

Traceback (most recent call last):
  File "main.py", line 498, in <module>
    main(args)
  File "main.py", line 409, in main
    train_stats = train_one_epoch(
  File "ConvNeXt/engine.py", line 63, in train_one_epoch
    assert math.isfinite(loss_value)
AssertionError

I wonder how I could fix this problem. Thanks very much!

uristern123 commented 1 year ago

Hi, This happened to me as well, did you find a solution to this problem?