CommissarMa / Context-Aware_Crowd_Counting-pytorch

The implementation of Context-Aware Crowd Counting(CVPR2019)
MIT License
68 stars 23 forks source link

Pretrained model and training on Part B #6

Closed AlexBar93 closed 5 years ago

AlexBar93 commented 5 years ago

Hello! Thanks for your implementation of the network. Unfortunately for non - chinese people it's hard to download from pan.baidu so could you please upload your pretrained model on another cloud like dropbox or google drive? Because I've been training on Shanghai Part A but can't get lower than 66 MAE even after 600 epochs. I've also been training on Shanghai Part B and convergence of the model seems to be much much slower. At 600 epochs MAE is around 28, much worse than the 7.8 reported in the paper. Even using the model trained on Part A for testing on Part B gives better results (something like 23 MAE). For training on Part B I just switched the optimizer to Adam instead of SGD and set batch size to 4 (because any more than that and a "cuda out of memory error" occurs, even though I'm training on a RTX 2080 Ti with 11GB of memory). Should I stick with it and just train for more epochs or change some other hyper parameters of the optimizer to train faster?

CommissarMa commented 5 years ago

@AlexBar93 I have updated the readme and add a dropbox link for the trained model in ShanghaiTech Part A. If you got MAE=28 in ShanghaiTech PartB, there must be something wrong and you can check if the GT density map is generated right or not.

chw9413 commented 5 years ago

My result of Part_A is similar to AlexBar93's, the min_mae is about 67 on epoch 1251. I got the number after updating code, however, the changes seem not to decrease my mae. Before that, my number is 66.1 on epoch 924.

AlexBar93 commented 5 years ago

Thanks for uploading your model! It gives me 60.8 MAE on ShanghaiA, which is even lower than the paper results. I trained with SGD instead of Adam on part B and got around 9 MAE, so probably there are some hyperparameters to change for training with Adam.

xinke-wang commented 5 years ago

Thanks for uploading your model! It gives me 60.8 MAE on ShanghaiA, which is even lower than the paper results. I trained with SGD instead of Adam on part B and got around 9 MAE, so probably there are some hyperparameters to change for training with Adam.

Hi @AlexBar93 , did you use the same hyper-parameters to train part B as used in part A? And have you tried to train part A again? I can only achieve about 66 MAE for Part A and more than 30 for Part B. Many thanks.

AlexBar93 commented 5 years ago

Yes i used the same parameters for partB as of partA. I didn't train again but when i did the first few times i couldn't achieve lower than 66 MAE either. If you get 30 MAE on partB using SGD then maybe you have some problems with the ground truth density maps. You should use fixed kernels to generate them instead of the adaptive ones used for partA in this repo, as reported in the paper.

xinke-wang commented 5 years ago

Yes i used the same parameters for partB as of partA. I didn't train again but when i did the first few times i couldn't achieve lower than 66 MAE either. If you get 30 MAE on partB using SGD then maybe you have some problems with the ground truth density maps. You should use fixed kernels to generate them instead of the adaptive ones used for partA in this repo, as reported in the paper.

Thank you very much for your kind response! I did use the adaptive kernels to generate ground truth density map for Part B, that's the problem.