666DZY666 / micronet

micronet, a model compression and deploy lib. compression: 1、quantization: quantization-aware-training(QAT), High-Bit(>2b)(DoReFa/Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference)、Low-Bit(≤2b)/Ternary and Binary(TWN/BNN/XNOR-Net); post-training-quantization(PTQ), 8-bit(tensorrt); 2、 pruning: normal、regular and group convolutional channel pruning; 3、 group convolution structure; 4、batch-normalization fuse for quantization. deploy: tensorrt, fp32/fp16/int8(ptq-calibration)、op-adapt(upsample)、dynamic_shape
MIT License
2.2k stars 478 forks source link

nin_gc训练一段step后,loss变很大很大 #97

Open EdwardVincentMa opened 2 years ago

EdwardVincentMa commented 2 years ago

刚开始正常,精度训练到80%多,训练一晚上早上到195个epoch再看,loss非常大,完全不收敛。不可思议

EdwardVincentMa commented 2 years ago

Test set: Average loss: 18154214.1728, Accuracy: 1000/10000 (10.00%) Best Accuracy: 77.37%

Train Epoch: 175 [0/50000 (0%)] Loss: 15677834.000000 LR: 0.0001 Train Epoch: 175 [3200/50000 (6%)] Loss: 4205774.000000 LR: 0.0001 Train Epoch: 175 [6400/50000 (13%)] Loss: 1340564.750000 LR: 0.0001 Train Epoch: 175 [9600/50000 (19%)] Loss: 573608.937500 LR: 0.0001 Train Epoch: 175 [12800/50000 (26%)] Loss: 13077519.000000 LR: 0.0001 Train Epoch: 175 [16000/50000 (32%)] Loss: 1872735.250000 LR: 0.0001 Train Epoch: 175 [19200/50000 (38%)] Loss: 845358.062500 LR: 0.0001 Train Epoch: 175 [22400/50000 (45%)] Loss: 20978710.000000 LR: 0.0001 Train Epoch: 175 [25600/50000 (51%)] Loss: 635413.625000 LR: 0.0001 Train Epoch: 175 [28800/50000 (58%)] Loss: 26684102.000000 LR: 0.0001 Train Epoch: 175 [32000/50000 (64%)] Loss: 18137484.000000 LR: 0.0001 Train Epoch: 175 [35200/50000 (70%)] Loss: 645895.500000 LR: 0.0001 Train Epoch: 175 [38400/50000 (77%)] Loss: 27134622.000000 LR: 0.0001 Train Epoch: 175 [41600/50000 (83%)] Loss: 3623150.500000 LR: 0.0001 Train Epoch: 175 [44800/50000 (90%)] Loss: 9524407.000000 LR: 0.0001 Train Epoch: 175 [48000/50000 (96%)] Loss: 785436.125000 LR: 0.0001

666DZY666 commented 2 years ago

bn融合了吗?融合了的话是会比较抖。学习率给小点。 先训个浮点,加载它,再做qat吧。