Open MassimoClementi opened 4 years ago
Same here... This seems to be a sign of model non-converging. When I changed the learning rate to 1e-6, the 'et_cnt' started to show some values, instead of 0. I guess the current setting is not optimal for other datasets other than ShanghaiTechA.
By the way, I'm also trying to reproduce MAE and MSE for shanghaiA. Could I ask about the results you got? The best results I can get are MAE=66, MSE=112... Curious about how close yours are...
Hi, thanks for the suggestion. I agree with you that probably it is a model converging issue, it would be interesting to know from the authors of the paper the configuration they chose for the ShanghaiB training, that is unfortunately not specified in the publication.
In my personal case with ShanghaiA i ran the training on 900 epoch with batch size 12 and the results are MAE=62.79
and MSE=104.57
. I think that with a longer training it would have converged to the values declared in the paper. Have a good day!
The study of the pruned_vgg version and other lightweight models is one of our recent work and the experiments on the paper are conducted with the original VGG as frontend.
Thanks for your interest and sorry for the lack of description about the difference. You may get some insight from our incoming paper if you are interest in the efficiency of the crowd counting network.
Thank you for the kind reply and I'm glad you will publish an additional paper specifically on pruned models. I switched to the non-pruned VGG as you suggested and loaded the pre-trained vgg16.h5
weights but unfortunately I have the same issue.
I had to lower the batch_size to 5 for GPU memory fitting (VGG full model is way bigger)
07-18:34 ----------------- Options ---------------
batch_size: 5 [default: 1]
crop_scale: 4
crop_size: [224, 224] [default: 224x224]
crop_type: Fixed
dataset: shanghaiB [default: shanghaiA]
disp_interval: 50
epochs: 900
expr_dir: ./saved_models/v7-shanghaiB-CRFVGG-exp-12-07_18-34/ [default: None]
gpus: [0] [default: None]
is_preload: False [default: True]
logger: <logging.RootLogger object at 0x7fa242b8de10> [default: None]
loss: NORMMSSSIM [default: MSE]
loss_scale: 1.0
lr: 1e-05
model_name: CRFVGG [default: None]
patches_per_sample: 5
pretrain: None
save_interval: 1000 [default: 500]
save_model_para: False [default: True]
use_tensorboard: False [default: True]
----------------- End -------------------
07-18:34 epoch: 0000, step 0000, Time: 302.49s, gt_cnt: ['33.7', '10.8', '0.1'], et_cnt: ['0.0', '0.0', '0.0'], loss: 3.561915e-01
07-18:34 epoch: 0000, step 0050, Time: 12.24s, gt_cnt: ['12.1', '2.5', '0.0'], et_cnt: ['0.0', '0.0', '0.0'], loss: 3.250003e-02
07-18:35 epoch: 0000, step 0100, Time: 12.21s, gt_cnt: ['62.2', '15.1', '0.0'], et_cnt: ['0.0', '0.0', '0.0'], loss: 1.598498e-01
07-18:35 epoch: 0000, step 0150, Time: 12.19s, gt_cnt: ['33.4', '10.4', '0.6'], et_cnt: ['0.0', '0.0', '0.0'], loss: 3.925477e-01
07-18:35 epoch: 0000, step 0200, Time: 12.16s, gt_cnt: ['42.0', '12.6', '0.2'], et_cnt: ['0.0', '0.0', '0.0'], loss: 1.059601e-01
07-18:36 epoch: 0000, step 0250, Time: 12.10s, gt_cnt: ['4.6', '2.0', '0.2'], et_cnt: ['0.0', '0.0', '0.0'], loss: 5.397027e-01
07-18:36 epoch: 0000, step 0300, Time: 12.09s, gt_cnt: ['1.6', '0.6', '0.0'], et_cnt: ['0.0', '0.0', '0.0'], loss: 1.780879e-01
07-18:36 epoch: 0000, step 0350, Time: 12.09s, gt_cnt: ['17.8', '4.1', '0.1'], et_cnt: ['0.0', '0.0', '0.0'], loss: 1.675186e-01
07-18:37 epoch: 0000, step 0400, Time: 12.07s, gt_cnt: ['11.2', '3.3', '0.4'], et_cnt: ['0.0', '0.0', '0.0'], loss: 8.789396e-02
07-18:37 epoch: 0000, step 0450, Time: 12.09s, gt_cnt: ['26.5', '13.3', '0.0'], et_cnt: ['0.0', '0.0', '0.0'], loss: 1.228484e-01
07-18:37 epoch: 0000, step 0500, Time: 12.08s, gt_cnt: ['31.0', '13.4', '0.0'], et_cnt: ['0.0', '0.0', '0.0'], loss: 1.732405e-01
07-18:38 epoch: 0000, step 0550, Time: 12.08s, gt_cnt: ['46.2', '10.6', '0.0'], et_cnt: ['0.0', '0.0', '0.0'], loss: 1.009654e-01
07-18:38 epoch: 0000, step 0600, Time: 12.08s, gt_cnt: ['17.8', '9.3', '1.1'], et_cnt: ['0.0', '0.0', '0.0'], loss: 4.852930e-01
07-18:38 epoch: 0000, step 0650, Time: 12.08s, gt_cnt: ['23.4', '10.0', '1.0'], et_cnt: ['0.0', '0.0', '0.0'], loss: 2.102067e-01
07-18:39 epoch: 0000, step 0700, Time: 12.08s, gt_cnt: ['29.4', '7.9', '0.0'], et_cnt: ['0.0', '0.0', '0.0'], loss: 1.005957e-01
07-18:39 epoch 0: 297.588757038 seconds; Path: ./saved_models/v7-shanghaiB-CRFVGG-exp-12-07_18-34/
07-18:39 Train loss: 0.0557746604582
07-18:39 epoch: 0001, step 0000, Time: 28.60s, gt_cnt: ['49.9', '15.8', '0.6'], et_cnt: ['0.0', '0.0', '0.0'], loss: 9.402770e-02
07-18:39 epoch: 0001, step 0050, Time: 12.06s, gt_cnt: ['105.2', '21.7', '0.0'], et_cnt: ['0.0', '0.0', '0.0'], loss: 3.162181e-02
07-18:40 epoch: 0001, step 0100, Time: 12.07s, gt_cnt: ['2.0', '0.6', '0.0'], et_cnt: ['0.0', '0.0', '0.0'], loss: 2.557841e-01
07-18:40 epoch: 0001, step 0150, Time: 12.08s, gt_cnt: ['76.0', '15.8', '0.2'], et_cnt: ['0.0', '0.0', '0.0'], loss: 7.393473e-02
07-18:40 epoch: 0001, step 0200, Time: 12.07s, gt_cnt: ['1.6', '0.6', '0.0'], et_cnt: ['0.0', '0.0', '0.0'], loss: 5.209067e-01
07-18:41 epoch: 0001, step 0250, Time: 12.07s, gt_cnt: ['11.6', '5.4', '1.0'], et_cnt: ['0.0', '0.0', '0.0'], loss: 4.533325e-01
07-18:41 epoch: 0001, step 0300, Time: 11.93s, gt_cnt: ['57.4', '16.6', '1.2'], et_cnt: ['0.0', '0.0', '0.0'], loss: 1.119947e-01
07-18:41 epoch: 0001, step 0350, Time: 12.02s, gt_cnt: ['11.0', '4.2', '0.4'], et_cnt: ['0.0', '0.0', '0.0'], loss: 3.066779e-01
07-18:42 epoch: 0001, step 0400, Time: 12.02s, gt_cnt: ['28.6', '8.7', '0.1'], et_cnt: ['0.0', '0.0', '0.0'], loss: 5.075014e-02
07-18:42 epoch: 0001, step 0450, Time: 12.02s, gt_cnt: ['8.5', '3.1', '0.0'], et_cnt: ['0.0', '0.0', '0.0'], loss: 4.467603e-01
and so on!
May you provide the precise configuration you used / worked in your case? I'm still struggling to make it work...
Haven't tried it yet but Shanghai B is usually not trained with adaptive kernel for dens map generation. The code does so however. So changing that to fixed kernel with Sigma=15 might do the trick.
Thank you for the answer, I am not able to try the solution but if you have the chance to test it and report again here I will close the issue for the other people looking for this solution! Thanks again
Hi,i met the same problem,when i use CRFVGG model to train Shanghai_partA,i got this: first,the et_cnt has some values, but when the epoches grow up , we can see the et_cnts are always zero.
I want to know if it is a configuration problem or somthing else? i can see
Hi, thanks for the suggestion. I agree with you that probably it is a model converging issue, it would be interesting to know from the authors of the paper the configuration they chose for the ShanghaiB training, that is unfortunately not specified in the publication.
In my personal case with ShanghaiA i ran the training on 900 epoch with batch size 12 and the results are
MAE=62.79
andMSE=104.57
. I think that with a longer training it would have converged to the values declared in the paper. Have a good day!
the mae=62.79 is it from vgg-pruned and epoches :900, cause i use model vgg-pruend to train 300 epoches and find the loss does not converge and get a very high MAE。Thanks ,hoping your reply.
I used the following configuration to train on ShanghaiA as close as possible to the results on the paper, you can also find out more in my repository
CUDA_VISIBLE_DEVICES=0 python -u nowtrain.py \
--model CRFVGG_prune \
--dataset shanghaiA \
--no-save \
--no-visual \
--save_interval 1000 \
--no-preload \
--batch_size 12 \
--patches_per_sample 5 \
--loss NORMMSSSIM \
--lr 0.00001 \
--gpus 0 \
--epochs 900
hi,thaks for replying,and i try several times to train shanghaipartA for 900 epoch.Below are my settings: The unique difference is no patches_per_sample,does that affect a lot?And tried several times, the test mae on 900 epoch is about 241,which is far away from 62.79.Thanks.
No it should not affect the results because 5
is already the default value (see train_options.py
in the src folder)
Have you tried redownloading both ShanghaiA and pretrained VGG pruned model from official websites? Are you sure that they are in the proper folders and loaded correctly by the scripts?
I search the VGG pruned.py model ,and from the code we can see the default is VGG 16 But in the function vgg16 is just creat vgg16 in the code vgg.py, i did not see any code about download pretrained model. Its true that i did not do the operation of download the pretrained model,and i check the code,and maybe it is the problem cause my question,according the guide from the author i did not see the operation which caused my confusion. Of course , the folders are correct and loading the dataset is also correct. So when i download the pretrained VGG pruned model from official websites,where should i set it ?Cause i did not find the proper code .Thanks for replying.
Sory ,i have been playing competition for a month,so today i find you have Stated clearly about the program,but when i use pretrain model which you provide: which caused a problem :
which i can not use the pretrained model correctly,Thanks for replying.
I think that the problem is that the pretrained model cannot be found, check again if the .h5 file is coherent with what stated in crowd_counting.py
around line 40. In my case the line is as follows:
network.load_net('pretrained_models/pruned_VGG.h5',` self.model.front_end, skip=True)
As a consequence my model has to be placed in a folder called pretrained_models
with name pruned_VGG.h5
.
I don’t know why github didn’t notify me of your reply in time. I saw your reply today. I successfully loaded the pre-trained model pruned_VGG.h5, but when I observe the experiment again, I will find that the predicted value is still generally 0.
And I tried to modify the learning rate 1e-06, which made me confused, I was still on the Shanghai A dataset. What I want to ask more is whether you have solved the training problem of this model on other data sets. I hope to hear from you, thanks
Hi, the problem of the training returning blank maps in certain conditions is indeed the point of this issue. @MarkusPfundstein reported a possible solution to the problem, however I am not able to verify it. I copy-paste the possible solution for clarity:
Haven't tried it yet but Shanghai B is usually not trained with adaptive kernel for dens map generation. The code does so however. So changing that to fixed kernel with Sigma=15 might do the trick.
Hi I manage to get the display of et_cnt as normal for techB, the trick is using default loss funtion"MSE" instead of original NORMMSSSIM; but not sure whether the model is converging or not. We can discuss here to find a better parameter for shanghai tech B or some own datasets. loss: MSE
Interesting find, thanks for the contribution!
Hi, i'm having troubles training your network on ShanghaiB
My
train.sh
is as follows:The dataset.py file contains the correct
SHANG_PATH
variable and loads the dataset successfully, as does with ShanghaiA indeed (the train on ShanghaiA works as intended, with MAE and MSE values close to the one declared in the paper). Note moreover that i'm successfully loading the weights of the pretrained pruned_VGG model.This is what is inside
log.log
after the train on ShanghaiB:...and so on
Analysing the saved support images:
I'm trying to train on a custom dataset too and it gives me the same strange behaviour.
Is this one a known issue or am i somehow configuring the net the wrong way?