Training on shanghaiB dataset returns empty maps

MassimoClementi commented 4 years ago

Hi, i'm having troubles training your network on ShanghaiB

My train.sh is as follows:

CUDA_VISIBLE_DEVICES=0 python -u nowtrain.py \
     --model CRFVGG_prune \
     --dataset shanghaiB \
     --no-save \
     --no-visual \
     --save_interval 2000 \
     --no-preload \
     --batch_size 12 \
     --loss NORMMSSSIM \
     --lr 0.00001 \
     --gpus 0 \
     --epochs 900

The dataset.py file contains the correct SHANG_PATH variable and loads the dataset successfully, as does with ShanghaiA indeed (the train on ShanghaiA works as intended, with MAE and MSE values close to the one declared in the paper). Note moreover that i'm successfully loading the weights of the pretrained pruned_VGG model.

This is what is inside log.log after the train on ShanghaiB:

29-09:19  ----------------- Options ---------------
               batch_size: 12                               [default: 1]
               crop_scale: 4                             
                crop_size: [224, 224]                       [default: 224x224]
                crop_type: Fixed                         
                  dataset: shanghaiB                        [default: shanghaiA]
            disp_interval: 50                            
                   epochs: 900                              [default: 900]
                 expr_dir: ./saved_models/v7-shanghaiB-CRFVGG_prune-exp-11-29_09-19/    [default: None]
                     gpus: [0]                              [default: None]
               is_preload: False                            [default: True]
                   logger: <logging.RootLogger object at 0x7f6968b23d90>    [default: None]
                     loss: NORMMSSSIM                       [default: MSE]
               loss_scale: 1.0                           
                       lr: 1e-05                         
               model_name: CRFVGG_prune                     [default: None]
       patches_per_sample: 5                             
                 pretrain: None                          
            save_interval: 2000                             [default: 500]
          save_model_para: False                            [default: True]
          use_tensorboard: False                            [default: True]
----------------- End -------------------
29-09:19  epoch: 0000, step 0000, Time: 650.18s, gt_cnt: ['73.1', '18.0', '0.0'], et_cnt: ['0.0', '0.0', '0.0'], loss: 1.268292e-01
29-09:19  epoch: 0000, step 0050, Time: 31.17s, gt_cnt: ['39.2', '7.3', '0.0'], et_cnt: ['0.0', '0.0', '0.0'], loss: 6.648326e-02
29-09:20  epoch: 0000, step 0100, Time: 31.15s, gt_cnt: ['46.5', '9.1', '0.0'], et_cnt: ['0.0', '0.0', '0.0'], loss: 7.417202e-02
29-09:20  epoch: 0000, step 0150, Time: 31.05s, gt_cnt: ['61.7', '13.2', '0.8'], et_cnt: ['0.0', '0.0', '0.0'], loss: 1.323981e-01
29-09:20  epoch: 0000, step 0200, Time: 31.08s, gt_cnt: ['25.1', '4.3', '0.0'], et_cnt: ['0.0', '0.0', '0.0'], loss: 9.322733e-02
29-09:21  epoch: 0000, step 0250, Time: 31.09s, gt_cnt: ['32.1', '7.2', '0.0'], et_cnt: ['0.0', '0.0', '0.0'], loss: 1.071948e-01
29-09:21  epoch 0: 116.332838058 seconds; Path: ./saved_models/v7-shanghaiB-CRFVGG_prune-exp-11-29_09-19/
29-09:21  Train loss: 0.0123367552956
29-09:21  epoch: 0001, step 0000, Time: 30.13s, gt_cnt: ['50.0', '9.3', '0.0'], et_cnt: ['0.0', '0.0', '0.0'], loss: 8.461368e-02
29-09:21  epoch: 0001, step 0050, Time: 31.03s, gt_cnt: ['35.7', '7.7', '0.0'], et_cnt: ['0.0', '0.0', '0.0'], loss: 1.263711e-01
29-09:22  epoch: 0001, step 0100, Time: 31.04s, gt_cnt: ['65.5', '6.7', '0.0'], et_cnt: ['0.0', '0.0', '0.0'], loss: 2.019131e-02
29-09:22  epoch: 0001, step 0150, Time: 31.06s, gt_cnt: ['10.0', '2.8', '0.0'], et_cnt: ['0.0', '0.0', '0.0'], loss: 3.342173e-01
29-09:22  epoch: 0001, step 0200, Time: 31.06s, gt_cnt: ['100.2', '19.4', '0.0'], et_cnt: ['0.0', '0.0', '0.0'], loss: 7.664585e-02
29-09:23  epoch: 0001, step 0250, Time: 31.05s, gt_cnt: ['13.0', '3.2', '0.3'], et_cnt: ['0.0', '0.0', '0.0'], loss: 2.617436e-01
29-09:23  epoch 1: 116.573723078 seconds; Path: ./saved_models/v7-shanghaiB-CRFVGG_prune-exp-11-29_09-19/
29-09:23  Train loss: 0.0120687507921
29-09:23  epoch: 0002, step 0000, Time: 30.03s, gt_cnt: ['30.8', '5.2', '0.0'], et_cnt: ['0.0', '0.0', '0.0'], loss: 1.147022e-01
29-09:23  epoch: 0002, step 0050, Time: 30.94s, gt_cnt: ['7.0', '1.3', '0.0'], et_cnt: ['0.0', '0.0', '0.0'], loss: 2.869298e-01
29-09:24  epoch: 0002, step 0100, Time: 30.93s, gt_cnt: ['57.9', '7.9', '0.0'], et_cnt: ['0.0', '0.0', '0.0'], loss: 7.114434e-02
29-09:24  epoch: 0002, step 0150, Time: 30.89s, gt_cnt: ['14.6', '3.2', '0.0'], et_cnt: ['0.0', '0.0', '0.0'], loss: 1.944941e-01
29-09:24  epoch: 0002, step 0200, Time: 30.88s, gt_cnt: ['18.5', '5.4', '0.2'], et_cnt: ['0.0', '0.0', '0.0'], loss: 1.117323e-01
29-09:25  epoch: 0002, step 0250, Time: 30.88s, gt_cnt: ['116.7', '19.3', '0.0'], et_cnt: ['0.0', '0.0', '0.0'], loss: 4.640031e-02
29-09:25  epoch 2: 117.068747997 seconds; Path: ./saved_models/v7-shanghaiB-CRFVGG_prune-exp-11-29_09-19/
29-09:25  Train loss: 0.012189711531
29-09:25  epoch: 0003, step 0000, Time: 30.02s, gt_cnt: ['9.2', '1.9', '0.0'], et_cnt: ['0.0', '0.0', '0.0'], loss: 7.679701e-02
29-09:25  epoch: 0003, step 0050, Time: 30.95s, gt_cnt: ['18.9', '3.2', '0.0'], et_cnt: ['0.0', '0.0', '0.0'], loss: 2.608384e-01
29-09:26  epoch: 0003, step 0100, Time: 30.96s, gt_cnt: ['82.7', '10.2', '0.0'], et_cnt: ['0.0', '0.0', '0.0'], loss: 6.823128e-02
29-09:26  epoch: 0003, step 0150, Time: 30.97s, gt_cnt: ['51.8', '8.1', '0.0'], et_cnt: ['0.0', '0.0', '0.0'], loss: 1.170376e-01
29-09:26  epoch: 0003, step 0200, Time: 30.96s, gt_cnt: ['48.3', '11.5', '0.0'], et_cnt: ['0.0', '0.0', '0.0'], loss: 2.587562e-01
29-09:27  epoch: 0003, step 0250, Time: 30.95s, gt_cnt: ['35.7', '6.2', '0.0'], et_cnt: ['0.0', '0.0', '0.0'], loss: 1.336125e-01
29-09:27  epoch 3: 116.842293978 seconds; Path: ./saved_models/v7-shanghaiB-CRFVGG_prune-exp-11-29_09-19/
29-09:27  Train loss: 0.0116901398367
29-09:27  epoch: 0004, step 0000, Time: 30.04s, gt_cnt: ['19.7', '3.1', '0.0'], et_cnt: ['0.0', '0.0', '0.0'], loss: 4.390001e-02
29-09:27  epoch: 0004, step 0050, Time: 30.94s, gt_cnt: ['27.3', '3.9', '0.0'], et_cnt: ['0.0', '0.0', '0.0'], loss: 6.501621e-02
29-09:28  epoch: 0004, step 0100, Time: 30.95s, gt_cnt: ['65.3', '8.5', '0.0'], et_cnt: ['0.0', '0.0', '0.0'], loss: 9.112000e-02
29-09:28  epoch: 0004, step 0150, Time: 30.95s, gt_cnt: ['25.7', '5.8', '0.0'], et_cnt: ['0.0', '0.0', '0.0'], loss: 1.254712e-01
29-09:28  epoch: 0004, step 0200, Time: 30.96s, gt_cnt: ['67.9', '9.1', '0.0'], et_cnt: ['0.0', '0.0', '0.0'], loss: 1.576619e-01
29-09:29  epoch: 0004, step 0250, Time: 30.96s, gt_cnt: ['17.1', '4.0', '0.0'], et_cnt: ['0.0', '0.0', '0.0'], loss: 9.440666e-02
29-09:29  epoch 4: 116.873697042 seconds; Path: ./saved_models/v7-shanghaiB-CRFVGG_prune-exp-11-29_09-19/
29-09:29  Train loss: 0.0121703634991
29-09:29  epoch: 0005, step 0000, Time: 30.08s, gt_cnt: ['3.7', '1.7', '0.0'], et_cnt: ['0.0', '0.0', '0.0'], loss: 2.605899e-01
29-09:29  epoch: 0005, step 0050, Time: 30.94s, gt_cnt: ['28.1', '7.0', '0.0'], et_cnt: ['0.0', '0.0', '0.0'], loss: 6.000650e-02
29-09:30  epoch: 0005, step 0100, Time: 30.94s, gt_cnt: ['13.1', '3.0', '0.0'], et_cnt: ['0.0', '0.0', '0.0'], loss: 1.473101e-01
29-09:30  epoch: 0005, step 0150, Time: 30.97s, gt_cnt: ['39.2', '4.5', '0.0'], et_cnt: ['0.0', '0.0', '0.0'], loss: 4.450452e-02
29-09:30  epoch: 0005, step 0200, Time: 30.99s, gt_cnt: ['31.6', '6.9', '0.0'], et_cnt: ['0.0', '0.0', '0.0'], loss: 2.196807e-01
29-09:30  epoch: 0005, step 0250, Time: 31.02s, gt_cnt: ['6.8', '1.8', '0.0'], et_cnt: ['0.0', '0.0', '0.0'], loss: 3.409955e-01
29-09:31  epoch 5: 116.780714989 seconds; Path: ./saved_models/v7-shanghaiB-CRFVGG_prune-exp-11-29_09-19/
29-09:31  Train loss: 0.0120297630628
29-09:31  epoch: 0006, step 0000, Time: 30.03s, gt_cnt: ['33.7', '9.5', '0.0'], et_cnt: ['0.0', '0.0', '0.0'], loss: 1.538777e-01
29-09:31  epoch: 0006, step 0050, Time: 30.95s, gt_cnt: ['21.3', '2.9', '0.0'], et_cnt: ['0.0', '0.0', '0.0'], loss: 1.912483e-01
29-09:31  epoch: 0006, step 0100, Time: 30.97s, gt_cnt: ['31.9', '5.7', '0.2'], et_cnt: ['0.0', '0.0', '0.0'], loss: 4.942214e-02
29-09:32  epoch: 0006, step 0150, Time: 30.99s, gt_cnt: ['75.8', '17.8', '0.0'], et_cnt: ['0.0', '0.0', '0.0'], loss: 1.404778e-01
29-09:32  epoch: 0006, step 0200, Time: 30.72s, gt_cnt: ['34.9', '7.4', '0.0'], et_cnt: ['0.0', '0.0', '0.0'], loss: 1.356392e-01
29-09:32  epoch: 0006, step 0250, Time: 30.99s, gt_cnt: ['27.8', '6.7', '0.0'], et_cnt: ['0.0', '0.0', '0.0'], loss: 2.686952e-01
29-09:33  epoch 6: 116.971039057 seconds; Path: ./saved_models/v7-shanghaiB-CRFVGG_prune-exp-11-29_09-19/
29-09:33  Train loss: 0.0118579972287
29-09:33  epoch: 0007, step 0000, Time: 30.01s, gt_cnt: ['117.8', '12.6', '0.0'], et_cnt: ['0.0', '0.0', '0.0'], loss: 1.916927e-02
29-09:33  epoch: 0007, step 0050, Time: 31.00s, gt_cnt: ['13.5', '3.0', '0.0'], et_cnt: ['0.0', '0.0', '0.0'], loss: 1.782330e-01
29-09:33  epoch: 0007, step 0100, Time: 31.01s, gt_cnt: ['28.2', '6.5', '0.0'], et_cnt: ['0.0', '0.0', '0.0'], loss: 9.380966e-02
29-09:34  epoch: 0007, step 0150, Time: 31.00s, gt_cnt: ['28.8', '6.6', '0.0'], et_cnt: ['0.0', '0.0', '0.0'], loss: 2.086301e-01
29-09:34  epoch: 0007, step 0200, Time: 31.01s, gt_cnt: ['112.3', '12.2', '0.0'], et_cnt: ['0.0', '0.0', '0.0'], loss: 8.932883e-02
29-09:34  epoch: 0007, step 0250, Time: 31.02s, gt_cnt: ['59.8', '10.3', '0.0'], et_cnt: ['0.0', '0.0', '0.0'], loss: 8.866626e-02
29-09:35  epoch 7: 116.707808971 seconds; Path: ./saved_models/v7-shanghaiB-CRFVGG_prune-exp-11-29_09-19/
29-09:35  Train loss: 0.0117596714033
29-09:35  epoch: 0008, step 0000, Time: 29.99s, gt_cnt: ['31.0', '4.9', '0.0'], et_cnt: ['0.0', '0.0', '0.0'], loss: 6.184965e-02
29-09:35  epoch: 0008, step 0050, Time: 30.94s, gt_cnt: ['44.6', '8.4', '0.0'], et_cnt: ['0.0', '0.0', '0.0'], loss: 9.860635e-02
29-09:35  epoch: 0008, step 0100, Time: 30.95s, gt_cnt: ['16.4', '3.7', '0.0'], et_cnt: ['0.0', '0.0', '0.0'], loss: 3.095835e-02
29-09:36  epoch: 0008, step 0150, Time: 30.97s, gt_cnt: ['23.3', '5.1', '0.0'], et_cnt: ['0.0', '0.0', '0.0'], loss: 6.901729e-02
29-09:36  epoch: 0008, step 0200, Time: 30.97s, gt_cnt: ['7.6', '1.4', '0.0'], et_cnt: ['0.0', '0.0', '0.0'], loss: 2.740443e-01
29-09:36  epoch: 0008, step 0250, Time: 30.96s, gt_cnt: ['44.3', '6.2', '0.2'], et_cnt: ['0.0', '0.0', '0.0'], loss: 6.009108e-02
29-09:37  epoch 8: 116.902456045 seconds; Path: ./saved_models/v7-shanghaiB-CRFVGG_prune-exp-11-29_09-19/
29-09:37  Train loss: 0.0121372637815

...and so on

Analysing the saved support images:

data: correct image crop
previous: correct density map
now: blank map
after: blank map

I'm trying to train on a custom dataset too and it gives me the same strange behaviour.

Is this one a known issue or am i somehow configuring the net the wrong way?

Klaraguo commented 4 years ago

Same here... This seems to be a sign of model non-converging. When I changed the learning rate to 1e-6, the 'et_cnt' started to show some values, instead of 0. I guess the current setting is not optimal for other datasets other than ShanghaiTechA.

By the way, I'm also trying to reproduce MAE and MSE for shanghaiA. Could I ask about the results you got? The best results I can get are MAE=66, MSE=112... Curious about how close yours are...

MassimoClementi commented 4 years ago

Hi, thanks for the suggestion. I agree with you that probably it is a model converging issue, it would be interesting to know from the authors of the paper the configuration they chose for the ShanghaiB training, that is unfortunately not specified in the publication.

In my personal case with ShanghaiA i ran the training on 900 epoch with batch size 12 and the results are MAE=62.79 and MSE=104.57. I think that with a longer training it would have converged to the values declared in the paper. Have a good day!

Legion56 commented 4 years ago

The study of the pruned_vgg version and other lightweight models is one of our recent work and the experiments on the paper are conducted with the original VGG as frontend.

Thanks for your interest and sorry for the lack of description about the difference. You may get some insight from our incoming paper if you are interest in the efficiency of the crowd counting network.

MassimoClementi commented 4 years ago

Thank you for the kind reply and I'm glad you will publish an additional paper specifically on pruned models. I switched to the non-pruned VGG as you suggested and loaded the pre-trained vgg16.h5 weights but unfortunately I have the same issue.

I had to lower the batch_size to 5 for GPU memory fitting (VGG full model is way bigger)

07-18:34  ----------------- Options ---------------
               batch_size: 5                                    [default: 1]
               crop_scale: 4                             
                crop_size: [224, 224]                           [default: 224x224]
                crop_type: Fixed                         
                  dataset: shanghaiB                            [default: shanghaiA]
            disp_interval: 50                            
                   epochs: 900                           
                 expr_dir: ./saved_models/v7-shanghaiB-CRFVGG-exp-12-07_18-34/  [default: None]
                     gpus: [0]                                  [default: None]
               is_preload: False                                [default: True]
                   logger: <logging.RootLogger object at 0x7fa242b8de10>        [default: None]
                     loss: NORMMSSSIM                           [default: MSE]
               loss_scale: 1.0                           
                       lr: 1e-05                         
               model_name: CRFVGG                               [default: None]
       patches_per_sample: 5                             
                 pretrain: None                          
            save_interval: 1000                                 [default: 500]
          save_model_para: False                                [default: True]
          use_tensorboard: False                                [default: True]
----------------- End -------------------
07-18:34  epoch: 0000, step 0000, Time: 302.49s, gt_cnt: ['33.7', '10.8', '0.1'], et_cnt: ['0.0', '0.0', '0.0'], loss: 3.561915e-01
07-18:34  epoch: 0000, step 0050, Time: 12.24s, gt_cnt: ['12.1', '2.5', '0.0'], et_cnt: ['0.0', '0.0', '0.0'], loss: 3.250003e-02
07-18:35  epoch: 0000, step 0100, Time: 12.21s, gt_cnt: ['62.2', '15.1', '0.0'], et_cnt: ['0.0', '0.0', '0.0'], loss: 1.598498e-01
07-18:35  epoch: 0000, step 0150, Time: 12.19s, gt_cnt: ['33.4', '10.4', '0.6'], et_cnt: ['0.0', '0.0', '0.0'], loss: 3.925477e-01
07-18:35  epoch: 0000, step 0200, Time: 12.16s, gt_cnt: ['42.0', '12.6', '0.2'], et_cnt: ['0.0', '0.0', '0.0'], loss: 1.059601e-01
07-18:36  epoch: 0000, step 0250, Time: 12.10s, gt_cnt: ['4.6', '2.0', '0.2'], et_cnt: ['0.0', '0.0', '0.0'], loss: 5.397027e-01
07-18:36  epoch: 0000, step 0300, Time: 12.09s, gt_cnt: ['1.6', '0.6', '0.0'], et_cnt: ['0.0', '0.0', '0.0'], loss: 1.780879e-01
07-18:36  epoch: 0000, step 0350, Time: 12.09s, gt_cnt: ['17.8', '4.1', '0.1'], et_cnt: ['0.0', '0.0', '0.0'], loss: 1.675186e-01
07-18:37  epoch: 0000, step 0400, Time: 12.07s, gt_cnt: ['11.2', '3.3', '0.4'], et_cnt: ['0.0', '0.0', '0.0'], loss: 8.789396e-02
07-18:37  epoch: 0000, step 0450, Time: 12.09s, gt_cnt: ['26.5', '13.3', '0.0'], et_cnt: ['0.0', '0.0', '0.0'], loss: 1.228484e-01
07-18:37  epoch: 0000, step 0500, Time: 12.08s, gt_cnt: ['31.0', '13.4', '0.0'], et_cnt: ['0.0', '0.0', '0.0'], loss: 1.732405e-01
07-18:38  epoch: 0000, step 0550, Time: 12.08s, gt_cnt: ['46.2', '10.6', '0.0'], et_cnt: ['0.0', '0.0', '0.0'], loss: 1.009654e-01
07-18:38  epoch: 0000, step 0600, Time: 12.08s, gt_cnt: ['17.8', '9.3', '1.1'], et_cnt: ['0.0', '0.0', '0.0'], loss: 4.852930e-01
07-18:38  epoch: 0000, step 0650, Time: 12.08s, gt_cnt: ['23.4', '10.0', '1.0'], et_cnt: ['0.0', '0.0', '0.0'], loss: 2.102067e-01
07-18:39  epoch: 0000, step 0700, Time: 12.08s, gt_cnt: ['29.4', '7.9', '0.0'], et_cnt: ['0.0', '0.0', '0.0'], loss: 1.005957e-01
07-18:39  epoch 0: 297.588757038 seconds; Path: ./saved_models/v7-shanghaiB-CRFVGG-exp-12-07_18-34/
07-18:39  Train loss: 0.0557746604582
07-18:39  epoch: 0001, step 0000, Time: 28.60s, gt_cnt: ['49.9', '15.8', '0.6'], et_cnt: ['0.0', '0.0', '0.0'], loss: 9.402770e-02
07-18:39  epoch: 0001, step 0050, Time: 12.06s, gt_cnt: ['105.2', '21.7', '0.0'], et_cnt: ['0.0', '0.0', '0.0'], loss: 3.162181e-02
07-18:40  epoch: 0001, step 0100, Time: 12.07s, gt_cnt: ['2.0', '0.6', '0.0'], et_cnt: ['0.0', '0.0', '0.0'], loss: 2.557841e-01
07-18:40  epoch: 0001, step 0150, Time: 12.08s, gt_cnt: ['76.0', '15.8', '0.2'], et_cnt: ['0.0', '0.0', '0.0'], loss: 7.393473e-02
07-18:40  epoch: 0001, step 0200, Time: 12.07s, gt_cnt: ['1.6', '0.6', '0.0'], et_cnt: ['0.0', '0.0', '0.0'], loss: 5.209067e-01
07-18:41  epoch: 0001, step 0250, Time: 12.07s, gt_cnt: ['11.6', '5.4', '1.0'], et_cnt: ['0.0', '0.0', '0.0'], loss: 4.533325e-01
07-18:41  epoch: 0001, step 0300, Time: 11.93s, gt_cnt: ['57.4', '16.6', '1.2'], et_cnt: ['0.0', '0.0', '0.0'], loss: 1.119947e-01
07-18:41  epoch: 0001, step 0350, Time: 12.02s, gt_cnt: ['11.0', '4.2', '0.4'], et_cnt: ['0.0', '0.0', '0.0'], loss: 3.066779e-01
07-18:42  epoch: 0001, step 0400, Time: 12.02s, gt_cnt: ['28.6', '8.7', '0.1'], et_cnt: ['0.0', '0.0', '0.0'], loss: 5.075014e-02
07-18:42  epoch: 0001, step 0450, Time: 12.02s, gt_cnt: ['8.5', '3.1', '0.0'], et_cnt: ['0.0', '0.0', '0.0'], loss: 4.467603e-01

and so on!

May you provide the precise configuration you used / worked in your case? I'm still struggling to make it work...

MarkusPfundstein commented 4 years ago

Haven't tried it yet but Shanghai B is usually not trained with adaptive kernel for dens map generation. The code does so however. So changing that to fixed kernel with Sigma=15 might do the trick.

MassimoClementi commented 4 years ago

Thank you for the answer, I am not able to try the solution but if you have the chance to test it and report again here I will close the issue for the other people looking for this solution! Thanks again

EillotY commented 4 years ago

Hi,i met the same problem,when i use CRFVGG model to train Shanghai_partA，i got this: first,the et_cnt has some values, but when the epoches grow up , we can see the et_cnts are always zero.

I want to know if it is a configuration problem or somthing else? i can see

Hi, thanks for the suggestion. I agree with you that probably it is a model converging issue, it would be interesting to know from the authors of the paper the configuration they chose for the ShanghaiB training, that is unfortunately not specified in the publication.

In my personal case with ShanghaiA i ran the training on 900 epoch with batch size 12 and the results are MAE=62.79 and MSE=104.57. I think that with a longer training it would have converged to the values declared in the paper. Have a good day!

the mae=62.79 is it from vgg-pruned and epoches :900, cause i use model vgg-pruend to train 300 epoches and find the loss does not converge and get a very high MAE。Thanks ,hoping your reply.

MassimoClementi commented 4 years ago

I used the following configuration to train on ShanghaiA as close as possible to the results on the paper, you can also find out more in my repository

CUDA_VISIBLE_DEVICES=0 python -u nowtrain.py \
     --model CRFVGG_prune \
     --dataset shanghaiA \
     --no-save \
     --no-visual \
     --save_interval 1000 \
     --no-preload \
     --batch_size 12 \
     --patches_per_sample 5 \
     --loss NORMMSSSIM \
     --lr 0.00001 \
     --gpus 0 \
     --epochs 900

EillotY commented 4 years ago

hi,thaks for replying,and i try several times to train shanghaipartA for 900 epoch.Below are my settings: The unique difference is no patches_per_sample,does that affect a lot?And tried several times, the test mae on 900 epoch is about 241,which is far away from 62.79.Thanks.

MassimoClementi commented 4 years ago

No it should not affect the results because 5 is already the default value (see train_options.py in the src folder)

Have you tried redownloading both ShanghaiA and pretrained VGG pruned model from official websites? Are you sure that they are in the proper folders and loaded correctly by the scripts?

EillotY commented 4 years ago

I search the VGG pruned.py model ,and from the code we can see the default is VGG 16 But in the function vgg16 is just creat vgg16 in the code vgg.py, i did not see any code about download pretrained model. Its true that i did not do the operation of download the pretrained model,and i check the code,and maybe it is the problem cause my question,according the guide from the author i did not see the operation which caused my confusion. Of course , the folders are correct and loading the dataset is also correct. So when i download the pretrained VGG pruned model from official websites,where should i set it ?Cause i did not find the proper code .Thanks for replying.

MassimoClementi commented 4 years ago

From the link in the readme of this repository you can find all the pre-trained model ready to be downloaded. Again, for further instructions you are free to have a look here.

EillotY commented 4 years ago

Sory ,i have been playing competition for a month,so today i find you have Stated clearly about the program,but when i use pretrain model which you provide: which caused a problem :

which i can not use the pretrained model correctly,Thanks for replying.

MassimoClementi commented 4 years ago

I think that the problem is that the pretrained model cannot be found, check again if the .h5 file is coherent with what stated in crowd_counting.py around line 40. In my case the line is as follows:

network.load_net('pretrained_models/pruned_VGG.h5',` self.model.front_end, skip=True)

As a consequence my model has to be placed in a folder called pretrained_models with name pruned_VGG.h5.

EillotY commented 4 years ago

I don’t know why github didn’t notify me of your reply in time. I saw your reply today. I successfully loaded the pre-trained model pruned_VGG.h5, but when I observe the experiment again, I will find that the predicted value is still generally 0.

And I tried to modify the learning rate 1e-06, which made me confused, I was still on the Shanghai A dataset. What I want to ask more is whether you have solved the training problem of this model on other data sets. I hope to hear from you, thanks

MassimoClementi commented 4 years ago

Hi, the problem of the training returning blank maps in certain conditions is indeed the point of this issue. @MarkusPfundstein reported a possible solution to the problem, however I am not able to verify it. I copy-paste the possible solution for clarity:

Haven't tried it yet but Shanghai B is usually not trained with adaptive kernel for dens map generation. The code does so however. So changing that to fixed kernel with Sigma=15 might do the trick.

alexsun009 commented 4 years ago

Hi I manage to get the display of et_cnt as normal for techB, the trick is using default loss funtion"MSE" instead of original NORMMSSSIM; but not sure whether the model is converging or not. We can discuss here to find a better parameter for shanghai tech B or some own datasets. loss: MSE

MassimoClementi commented 4 years ago

Interesting find, thanks for the contribution!

Legion56 / Counting-ICCV-DSSINet

Training on shanghaiB dataset returns empty maps #4