CoinCheung / BiSeNet

Add bisenetv2. My implementation of BiSeNet
MIT License
1.42k stars 305 forks source link

The technique of reproducing the author's accuracy #40

Open huang229 opened 4 years ago

huang229 commented 4 years ago

Although this is not the first time for me to hang out with the author, I would like to thank the author again for the code. I've almost reproduced the accuracy of the open source code here. Accuracy of single scale test: 76.17% Multi scale accuracy test: 77.90% It's very easy to get this precision. I downloaded the code directly here, and then the training environment is similar to the one mentioned in the author's code. That is, I can run the code directly without any modification. Be sure to remember to train directly without any modification. To add up, the capacity of my GPU graphics card is still a little small. Each GPU's batch_size is 6, and the sum of the two graphics cards is 12.So it's normal that the accuracy here is a little bit poor.

huang229 commented 4 years ago

In addition, before getting the precision here, I used my own environment, so the code was modified as follows: pytorch1.2.0 syncbatchnorm----->nn.batchnorm batch_size =8 gpu_nums =1,Only one GPU train_data_szie =640*480 I worked for half a month. No matter how I train, the highest accuracy is only 70.0%. Therefore, I would like to remind you to deploy an environment similar to that of the author as much as possible to ensure that the downloaded code can be trained directly without any modification. After getting the author's training accuracy, then gradually change each influencing factor, get the accuracy mentioned by the author, and then slowly change towards their own environment. Only in this way can we know which factors cause the accuracy can not be reproduced.

CoinCheung commented 4 years ago

Thanks for verifying!! I am happy that you can train your model well.

huang229 commented 4 years ago

pytorch1.2.0 ubuntu18.04 cuda9.0 batch_size =6 gpu_nums =1,Only one GPU train_data_szie =1024*1024 max_iter=160000 I feel that the author's code portability is very good. I use both pytorch 1.0 and pytorch 1.2. I think there should be no problem with later versions of pytorch. I am currently testing the impact of different factors on the model. Post the results to share your results. it: 159500/160000, lr: 0.000056, loss: 2.5404, eta: 0:04:28, time: 26.4423 it: 159550/160000, lr: 0.000051, loss: 2.5879, eta: 0:04:01, time: 26.5868 it: 159600/160000, lr: 0.000046, loss: 2.5492, eta: 0:03:34, time: 26.6619 it: 159650/160000, lr: 0.000041, loss: 2.5945, eta: 0:03:07, time: 26.8894 it: 159700/160000, lr: 0.000035, loss: 2.5278, eta: 0:02:41, time: 26.4492 it: 159750/160000, lr: 0.000030, loss: 2.5143, eta: 0:02:14, time: 26.6291 it: 159800/160000, lr: 0.000025, loss: 2.5752, eta: 0:01:47, time: 26.5566 it: 159850/160000, lr: 0.000019, loss: 2.5299, eta: 0:01:20, time: 26.5901 it: 159900/160000, lr: 0.000013, loss: 2.5764, eta: 0:00:54, time: 28.0687 it: 159950/160000, lr: 0.000007, loss: 2.5445, eta: 0:00:27, time: 26.5950 it: 160000/160000, lr: 0.000000, loss: 2.5528, eta: 0:00:00, time: 26.7321 training done, model saved to: ./res/model_final.pth

evaluating the model ... setup and restore model compute the mIOU 100%|█████████████████████████████████████████| 250/250 [17:49<00:00, 4.28s/it] mIOU is: 0.780227 mIOU = 0.7802269045024711 (pytorch)BiSeNet_syncbn$ CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --npr oc_per_node=1 train.py 100%|█████████████████████████████████████████| 250/250 [03:28<00:00, 1.20it/s] mIOU = 0.7599654703378823

CuttlefishXuan commented 3 years ago

Hi, how long did your training process take with batch_size=6? It appears to more than 2 days with batch size=16, and my gpu_nums=4 (2080Ti). Is it normal?