Regarding Experimental Results

yangtle commented 11 months ago

Dear author,

Hello. I conducted multiple experiments on the SHA dataset following the provided code and parameter settings. Each experiment consisted of 1500 epochs with val=5. However, I observed fluctuations in the results, with MAE around 52 and MSE around 85. I would like to inquire whether setting val to 1 is necessary to achieve the results mentioned in the paper or if there are additional settings required.

Best regards.

cxliu0 commented 11 months ago

Hi, we notice that the performance of the trained model may fluctuate when torch version > 1.7.

To improve reproducibility, you may add torch.backends.cudnn.deterministic = True in main.py #L98.
For the SHA dataset, we can reproduce the results based on this repository, and we did not change any parameters.

Could you provide your environment? We could check the results and get back to you later.

yangtle commented 11 months ago

Hello, thank you for your reply! My environment version is python=3.8.13, pytorch=1.12.1, CUDA=11.7, and the GPU is NVIDIA A40. I have already set torch.backends.cudnn.deterministic = True and am currently conducting experiments.

In addition, I would like to consult with you about the issue of fluctuation in data results, primarily manifested in ① MAE fluctuating between approximately 2-3, which could be the most critical; ② at the same time, the epoch stage for achieving the optimal result is not consistent, sometimes around 200 with generally lower validation MAE, and other times around 1000.

To be honest, I observe a similar situation in my own model. What is your perspective on this? Is this normal for the crowd counting domain, where we focus solely on the MAE evaluation metric? Thank you.

cxliu0 commented 11 months ago

Many factors could lead to performance fluctuation, e.g., torch version and image processing. Even the same model may produce different outputs under different torch versions.

Actually, performance fluctuation also occurs in previous works. The optimal epoch may not be necessarily the same under different settings. That is the reason I suggest adding torch.backends.cudnn.deterministic = True before training. This can ensure that the optimal epoch is the same every time you train the model.

Regarding the evaluation metric, MAE is more sensitive compared with Average Precision (AP) used in object detection. Based on previous works, perhaps fluctuation is common in crowd counting.

cxliu0 commented 11 months ago

[Update] We have set up the environment (python=3.8, pytorch=1.12) and trained the model. The performance seems fine and the optimal epoch is 765. By the way, we have added torch.backends.cudnn.deterministic = True before training, and the results are reproducible.

Here is a snippet of the training log:

[ep 761][lr 0.0001000]{"train_lr": 9.999999999999994e-05, "train_loss": 0.2394831186613521, "train_loss_ce_sp": 0.14828089060815605, "train_loss_points_sp": 0.001931241263534773, "train_loss_ce_ds": 0.08730715946168513, "train_loss_points_ds": 0.0015310150873218034, "train_loss_split": 0.000432813169485033, "train_loss_ce_sp_unscaled": 0.14828089060815605, "train_loss_points_sp_unscaled": 0.0003862482530215906, "train_loss_ce_ds_unscaled": 0.08730715946168513, "train_loss_points_ds_unscaled": 0.00030620301510459066, "train_loss_split_unscaled": 0.004328131675720215, "epoch": 761, "n_parameters": 20909385}

[ep 762][lr 0.0001000]{"train_lr": 9.999999999999994e-05, "train_loss": 0.254863301040353, "train_loss_ce_sp": 0.15025504396573916, "train_loss_points_sp": 0.001883206473713791, "train_loss_ce_ds": 0.1012990406236133, "train_loss_points_ds": 0.0014172781107796204, "train_loss_split": 8.732402739311154e-06, "train_loss_ce_sp_unscaled": 0.15025504396573916, "train_loss_points_sp_unscaled": 0.0003766412941134862, "train_loss_ce_ds_unscaled": 0.1012990406236133, "train_loss_points_ds_unscaled": 0.0002834556232571501, "train_loss_split_unscaled": 8.732402646863782e-05, "epoch": 762, "n_parameters": 20909385}

[ep 763][lr 0.0001000]{"train_lr": 9.999999999999994e-05, "train_loss": 0.24386153108364828, "train_loss_ce_sp": 0.14579840346768097, "train_loss_points_sp": 0.0017963021874075403, "train_loss_ce_ds": 0.09482591055534981, "train_loss_points_ds": 0.0014338326621548953, "train_loss_split": 7.0847370220060325e-06, "train_loss_ce_sp_unscaled": 0.14579840346768097, "train_loss_points_sp_unscaled": 0.00035926043700955406, "train_loss_ce_ds_unscaled": 0.09482591055534981, "train_loss_points_ds_unscaled": 0.000286766530464504, "train_loss_split_unscaled": 7.084736952910552e-05, "epoch": 763, "n_parameters": 20909385}

[ep 764][lr 0.0001000]{"train_lr": 9.999999999999994e-05, "train_loss": 0.24069207403305415, "train_loss_ce_sp": 0.1397619226292984, "train_loss_points_sp": 0.0018185121667999272, "train_loss_ce_ds": 0.09767922215365074, "train_loss_points_ds": 0.0014291233162873902, "train_loss_split": 3.2890487788692454e-06, "train_loss_ce_sp_unscaled": 0.1397619226292984, "train_loss_points_sp_unscaled": 0.0003637024319441234, "train_loss_ce_ds_unscaled": 0.09767922215365074, "train_loss_points_ds_unscaled": 0.00028582466404406806, "train_loss_split_unscaled": 3.289048736159866e-05, "epoch": 764, "n_parameters": 20909385}

[ep 765][lr 0.0001000]{"train_lr": 9.999999999999994e-05, "train_loss": 0.25245368943826574, "train_loss_ce_sp": 0.15302851373279416, "train_loss_points_sp": 0.0018379672814311612, "train_loss_ce_ds": 0.09610540419816971, "train_loss_points_ds": 0.0014743787264828948, "train_loss_split": 7.426094503818751e-06, "train_loss_ce_sp_unscaled": 0.15302851373279416, "train_loss_points_sp_unscaled": 0.00036759345833136623, "train_loss_ce_ds_unscaled": 0.09610540419816971, "train_loss_points_ds_unscaled": 0.00029487574686975896, "train_loss_split_unscaled": 7.42609436447556e-05, "epoch": 765, "n_parameters": 20909385}

epoch:765, mae:49.07692307692308, mse:76.87309365098122, time:12.311256408691406, 

best mae:49.07692307692308, best epoch: 765

SherlockHolmes221 commented 11 months ago

Hi, we notice that the performance of the trained model may fluctuate when torch version > 1.7.

To improve reproducibility, you may add torch.backends.cudnn.deterministic = True in main.py #L98.

For the SHA dataset, we can reproduce the results based on this repository, and we did not change any parameters.

Could you provide your environment? We could check the results and get back to you later.

so which pytorch version is recommend？

cxliu0 commented 11 months ago

We have tested pytorch 1.7 and pytorch 1.12. Both versions can achieve the reported results (or slightly better than the reported results). You may also try to use other pytorch versions if you like.

little-seasalt commented 11 months ago

We have tested pytorch 1.7 and pytorch 1.12. Both versions can achieve the reported results (or slightly better than the reported results). You may also try to use other pytorch versions if you like.

I retrained SHA in the environment of pytorch=1.12.1, but the final MAE was 53.36 and the MSE was 88.26, which are quite different from the results of the paper. What factors do you think may cause this situation? I have added torch.backends.cudnn.deterministic = True in main.py.

yangtle commented 11 months ago

I feel that there is indeed quite a significant fluctuation in the crowd counting model. Personally, I believe that the SHA dataset should reach the level of the paper. Currently, my best result is around 50, but there have also been cases where it reached 52 or 53.

In addition, I would like to ask, when it comes to datasets like QNRF, SHB, JHU, are the parameter settings for training used directly, or have there been any changes made to the parameters?

little-seasalt commented 11 months ago

I feel that there is indeed quite a significant fluctuation in the crowd counting model. Personally, I believe that the SHA dataset should reach the level of the paper. Currently, my best result is around 50, but there have also been cases where it reached 52 or 53.

In addition, I would like to ask, when it comes to datasets like QNRF, SHB, JHU, are the parameter settings for training used directly, or have there been any changes made to the parameters?

Hello,i would like to know, which epoch does MAE reach around 50? Also,have you modified the relevant parameters?

yangtle commented 11 months ago

I feel that there is indeed quite a significant fluctuation in the crowd counting model. Personally, I believe that the SHA dataset should reach the level of the paper. Currently, my best result is around 50, but there have also been cases where it reached 52 or 53. In addition, I would like to ask, when it comes to datasets like QNRF, SHB, JHU, are the parameter settings for training used directly, or have there been any changes made to the parameters?

Hello,i would like to know, which epoch does MAE reach around 50? Also,have you modified the relevant parameters?

[ep 1151][lr 0.0001000][34.14s]{"train_lr": 9.999999999999994e-05, "train_loss": 0.222655377476602, "train_loss_ce_sp": 0.1416090032136118, "train_loss_points_sp": 0.0019127310322899673, "train_loss_ce_ds": 0.07753163435169168, "train_loss_points_ds": 0.001594329293741769, "train_loss_split": 7.679978100103088e-06, "train_loss_ce_sp_unscaled": 0.1416090032136118, "train_loss_points_sp_unscaled": 0.00038254620567140346, "train_loss_ce_ds_unscaled": 0.07753163435169168, "train_loss_points_ds_unscaled": 0.0003188658581190818, "train_loss_split_unscaled": 7.679977932491818e-05, "epoch": 1151, "n_parameters": 20909385}

[ep 1152][lr 0.0001000][33.61s]{"train_lr": 9.999999999999994e-05, "train_loss": 0.20860815420746803, "train_loss_ce_sp": 0.12717940456963875, "train_loss_points_sp": 0.0018302943821795084, "train_loss_ce_ds": 0.07816224245706925, "train_loss_points_ds": 0.0014164968412621198, "train_loss_split": 1.971737731348394e-05, "train_loss_ce_sp_unscaled": 0.12717940456963875, "train_loss_points_sp_unscaled": 0.00036605887627858365, "train_loss_ce_ds_unscaled": 0.07816224245706925, "train_loss_points_ds_unscaled": 0.00028329936833108296, "train_loss_split_unscaled": 0.00019717377585333747, "epoch": 1152, "n_parameters": 20909385}

[ep 1153][lr 0.0001000][34.25s]{"train_lr": 9.999999999999994e-05, "train_loss": 0.22202433383948095, "train_loss_ce_sp": 0.13644243716388135, "train_loss_points_sp": 0.0018351712213778818, "train_loss_ce_ds": 0.08227713154377164, "train_loss_points_ds": 0.0014612032305071684, "train_loss_split": 8.38927346618417e-06, "train_loss_ce_sp_unscaled": 0.13644243716388135, "train_loss_points_sp_unscaled": 0.00036703424317435036, "train_loss_ce_ds_unscaled": 0.08227713154377164, "train_loss_points_ds_unscaled": 0.00029224064476423065, "train_loss_split_unscaled": 8.389273205318966e-05, "epoch": 1153, "n_parameters": 20909385}

[ep 1154][lr 0.0001000][34.32s]{"train_lr": 9.999999999999994e-05, "train_loss": 0.23919424092447436, "train_loss_ce_sp": 0.1506406638670612, "train_loss_points_sp": 0.0018515667714480613, "train_loss_ce_ds": 0.08527221067531689, "train_loss_points_ds": 0.001426672808372894, "train_loss_split": 3.12714969806092e-06, "train_loss_ce_sp_unscaled": 0.1506406638670612, "train_loss_points_sp_unscaled": 0.0003703133568067003, "train_loss_ce_ds_unscaled": 0.08527221067531689, "train_loss_points_ds_unscaled": 0.00028533456143860176, "train_loss_split_unscaled": 3.127149633459143e-05, "epoch": 1154, "n_parameters": 20909385}

[ep 1155][lr 0.0001000][34.66s]{"train_lr": 9.999999999999994e-05, "train_loss": 0.2269019383836437, "train_loss_ce_sp": 0.13837270579628042, "train_loss_points_sp": 0.001710777438396739, "train_loss_ce_ds": 0.0853276233616713, "train_loss_points_ds": 0.0014882287312601064, "train_loss_split": 2.6029510128812553e-06, "train_loss_ce_sp_unscaled": 0.13837270579628042, "train_loss_points_sp_unscaled": 0.0003421554865781218, "train_loss_ce_ds_unscaled": 0.0853276233616713, "train_loss_points_ds_unscaled": 0.00029764574916240434, "train_loss_split_unscaled": 2.6029509467047614e-05, "epoch": 1155, "n_parameters": 20909385}

epoch:1155, mae:50.862637362637365, mse:84.92643875731514, time39.40563941001892,

best mae:50.862637362637365, best epoch: 1155 I have not modified the model, and setting torch.backends.cudnn.deterministic = True ensures consistent results every time.

little-seasalt commented 11 months ago

I feel that there is indeed quite a significant fluctuation in the crowd counting model. Personally, I believe that the SHA dataset should reach the level of the paper. Currently, my best result is around 50, but there have also been cases where it reached 52 or 53. In addition, I would like to ask, when it comes to datasets like QNRF, SHB, JHU, are the parameter settings for training used directly, or have there been any changes made to the parameters?

Hello,i would like to know, which epoch does MAE reach around 50? Also,have you modified the relevant parameters?

[ep 1151][lr 0.0001000][34.14s]{"train_lr": 9.999999999999994e-05, "train_loss": 0.222655377476602, "train_loss_ce_sp": 0.1416090032136118, "train_loss_points_sp": 0.0019127310322899673, "train_loss_ce_ds": 0.07753163435169168, "train_loss_points_ds": 0.001594329293741769, "train_loss_split": 7.679978100103088e-06, "train_loss_ce_sp_unscaled": 0.1416090032136118, "train_loss_points_sp_unscaled": 0.00038254620567140346, "train_loss_ce_ds_unscaled": 0.07753163435169168, "train_loss_points_ds_unscaled": 0.0003188658581190818, "train_loss_split_unscaled": 7.679977932491818e-05, "epoch": 1151, "n_parameters": 20909385}

[ep 1152][lr 0.0001000][33.61s]{"train_lr": 9.999999999999994e-05, "train_loss": 0.20860815420746803, "train_loss_ce_sp": 0.12717940456963875, "train_loss_points_sp": 0.0018302943821795084, "train_loss_ce_ds": 0.07816224245706925, "train_loss_points_ds": 0.0014164968412621198, "train_loss_split": 1.971737731348394e-05, "train_loss_ce_sp_unscaled": 0.12717940456963875, "train_loss_points_sp_unscaled": 0.00036605887627858365, "train_loss_ce_ds_unscaled": 0.07816224245706925, "train_loss_points_ds_unscaled": 0.00028329936833108296, "train_loss_split_unscaled": 0.00019717377585333747, "epoch": 1152, "n_parameters": 20909385}

[ep 1153][lr 0.0001000][34.25s]{"train_lr": 9.999999999999994e-05, "train_loss": 0.22202433383948095, "train_loss_ce_sp": 0.13644243716388135, "train_loss_points_sp": 0.0018351712213778818, "train_loss_ce_ds": 0.08227713154377164, "train_loss_points_ds": 0.0014612032305071684, "train_loss_split": 8.38927346618417e-06, "train_loss_ce_sp_unscaled": 0.13644243716388135, "train_loss_points_sp_unscaled": 0.00036703424317435036, "train_loss_ce_ds_unscaled": 0.08227713154377164, "train_loss_points_ds_unscaled": 0.00029224064476423065, "train_loss_split_unscaled": 8.389273205318966e-05, "epoch": 1153, "n_parameters": 20909385}

[ep 1154][lr 0.0001000][34.32s]{"train_lr": 9.999999999999994e-05, "train_loss": 0.23919424092447436, "train_loss_ce_sp": 0.1506406638670612, "train_loss_points_sp": 0.0018515667714480613, "train_loss_ce_ds": 0.08527221067531689, "train_loss_points_ds": 0.001426672808372894, "train_loss_split": 3.12714969806092e-06, "train_loss_ce_sp_unscaled": 0.1506406638670612, "train_loss_points_sp_unscaled": 0.0003703133568067003, "train_loss_ce_ds_unscaled": 0.08527221067531689, "train_loss_points_ds_unscaled": 0.00028533456143860176, "train_loss_split_unscaled": 3.127149633459143e-05, "epoch": 1154, "n_parameters": 20909385}

[ep 1155][lr 0.0001000][34.66s]{"train_lr": 9.999999999999994e-05, "train_loss": 0.2269019383836437, "train_loss_ce_sp": 0.13837270579628042, "train_loss_points_sp": 0.001710777438396739, "train_loss_ce_ds": 0.0853276233616713, "train_loss_points_ds": 0.0014882287312601064, "train_loss_split": 2.6029510128812553e-06, "train_loss_ce_sp_unscaled": 0.13837270579628042, "train_loss_points_sp_unscaled": 0.0003421554865781218, "train_loss_ce_ds_unscaled": 0.0853276233616713, "train_loss_points_ds_unscaled": 0.00029764574916240434, "train_loss_split_unscaled": 2.6029509467047614e-05, "epoch": 1155, "n_parameters": 20909385}

epoch:1155, mae:50.862637362637365, mse:84.92643875731514, time39.40563941001892,

best mae:50.862637362637365, best epoch: 1155 I have not modified the model, and setting torch.backends.cudnn.deterministic = True ensures consistent results every time.

Thank you for your answer.

cxliu0 commented 11 months ago

We have tested pytorch 1.7 and pytorch 1.12. Both versions can achieve the reported results (or slightly better than the reported results). You may also try to use other pytorch versions if you like.

I retrained SHA in the environment of pytorch=1.12.1, but the final MAE was 53.36 and the MSE was 88.26, which are quite different from the results of the paper. What factors do you think may cause this situation? I have added torch.backends.cudnn.deterministic = True in main.py.

Perhaps different environments still have an impact on the performance, which can affect data processing and model optimization. You may try to tune the scale augmentation parameters to see how the performance changes. We will also try to test the code on different machines to see what happens.

In addition, the quality of SHA may also contribute to performance fluctuation. Compared with other datasets, SHA is a relatively small dataset.

cxliu0 commented 11 months ago

I feel that there is indeed quite a significant fluctuation in the crowd counting model. Personally, I believe that the SHA dataset should reach the level of the paper. Currently, my best result is around 50, but there have also been cases where it reached 52 or 53.

In addition, I would like to ask, when it comes to datasets like QNRF, SHB, JHU, are the parameter settings for training used directly, or have there been any changes made to the parameters?

SHA and SHB can share the same training parameters. Regarding other datasets, please refer to this issue.

cxliu0 / PET

Regarding Experimental Results #8