Open priceee opened 3 years ago
Running into a similar issue with Python 3.6 and torch 1.4 Have you found a fix? I'm guessing it might be by design if your val_acc stops increasing (#135 ),
This code uses early stop in 'train.py' line31. You have to change patience value more than 5.
I'm running at Python 3.7, torch 1.6. The training stops with no warning nor error. Only "Process finished with exit code 0" I've tried to run it with "nohup", but it still stops at 20~30 epoches.
Part of the console output:
Epoch 00028: val_acc reached 0.90505 (best 0.90505), saving model to cls-ssg/epoch=28-val_loss=0.30-val_acc=0.905.ckpt as top 2 Epoch 30: 91%|▉| 350/385 [01:25<00:09, 3.68it/s, loss=0.198, train_acc=0.906, Validating: 0%| | 0/78 [00:00<?, ?it/s] Epoch 30: : 400it [01:27, 5.05it/s, loss=0.198, train_acc=0.906, v_num=42, val_acc=0.905, val_loss=0.302] Epoch 30: : 450it [01:35, 5.95it/s, loss=0.193, train_acc=0.938, v_num=42, val_acc=0.903, val_loss=0.304] [2021-04-23 14:40:17,513][root][INFO] - Epoch 00029: val_acc was not in top 2 Epoch 31: 91%|▉| 350/385 [01:25<00:09, 3.68it/s, loss=0.209, train_acc=0.875, Validating: 0%| | 0/78 [00:00<?, ?it/s] Epoch 31: : 400it [01:27, 5.04it/s, loss=0.209, train_acc=0.875, v_num=42, val_acc=0.903, val_loss=0.304] Epoch 31: : 450it [01:35, 5.94it/s, loss=0.210, train_acc=0.969, v_num=42, val_acc=0.903, val_loss=0.303] [2021-04-23 14:41:52,866][root][INFO] - Epoch 00030: val_acc was not in top 2 Epoch 31: : 450it [01:35, 4.72it/s, loss=0.210, train_acc=0.969, v_num=42, val_acc=0.903, val_loss=0.303]
Process finished with exit code 0