Shimmer93 / MPCount

Official repo for CVPR2024 paper "Single Domain Generalization for Crowd Counting"
Apache License 2.0
60 stars 5 forks source link

Is there any bug that causes a lower performance #10

Open Glorainow opened 3 months ago

Glorainow commented 3 months ago

Test Performance are not as good as claimed For example: train on sta test on stb, claimed performace is: mae=11.4 and mse=19.7, but following the script yeilds mae=11.375 and mse=386.8510 key difference is the mse Is there any bug when test mse?

Shimmer93 commented 3 months ago

The "mse" reported in most crowd counting papers is actually "rmse", r standing for rooted. In the repo I only calculated raw mse, so that's the difference.

D-VVY commented 3 months ago

Test Performance are not as good as claimed For example: train on sta test on stb, claimed performace is: mae=11.4 and mse=19.7, but following the script yeilds mae=11.375 and mse=386.8510 key difference is the mse Is there any bug when test mse?

Hello, could you provide the details of your training environment? For example, the GPU model, memory, epochs, batch size, and most importantly, the random seed.

Glorainow commented 3 months ago

The "mse" reported in most crowd counting papers is actually "rmse", r standing for rooted. In the repo I only calculated raw mse, so that's the difference.

Thanks A Lot!

Glorainow commented 3 months ago

Test Performance are not as good as claimed For example: train on sta test on stb, claimed performace is: mae=11.4 and mse=19.7, but following the script yeilds mae=11.375 and mse=386.8510 key difference is the mse Is there any bug when test mse?

Hello, could you provide the details of your training environment? For example, the GPU model, memory, epochs, batch size, and most importantly, the random seed.

GPU is 2080, other environment settings are the same as this project.

D-VVY commented 3 months ago

Test Performance are not as good as claimed For example: train on sta test on stb, claimed performace is: mae=11.4 and mse=19.7, but following the script yeilds mae=11.375 and mse=386.8510 key difference is the mse Is there any bug when test mse?

Hello, could you provide the details of your training environment? For example, the GPU model, memory, epochs, batch size, and most importantly, the random seed.

GPU is 2080, other environment settings are the same as this project.

Thank you for your response! May I ask if you only used the author's checkpoint for testing or if you retrained the model yourself before testing? I have run it a few times myself, but the metrics still differ from those in the paper, which is quite troubling!

Shimmer93 commented 3 months ago

Test Performance are not as good as claimed For example: train on sta test on stb, claimed performace is: mae=11.4 and mse=19.7, but following the script yeilds mae=11.375 and mse=386.8510 key difference is the mse Is there any bug when test mse?

Hello, could you provide the details of your training environment? For example, the GPU model, memory, epochs, batch size, and most importantly, the random seed.

GPU is 2080, other environment settings are the same as this project.

Thank you for your response! May I ask if you only used the author's checkpoint for testing or if you retrained the model yourself before testing? I have run it a few times myself, but the metrics still differ from those in the paper, which is quite troubling!

The hardware is RTX3090, and hyperparameters should be the same as in the config file. It is awkward that the model is not fully deterministic even if the seed is fixed, and a crash of the cluster ruined all of my files so I cannot check the log. Sorry for the inconvenience.

I searched a little bit and the problem seems to be that F.interpolate is not deterministic. I didn't realize this in my implementation. In DG the results can differ greatly, especially in crowd counting where datasets are quite small.

Glorainow commented 3 months ago

I tested the offical models provided by author, it can produce results close to the claimed mae but much bigger mse. The author just explained why mse values are different in this page. I also reproduced the training part. The mae and mse are a little bit higher than the models provided by author. I didn't change a lot on the code, just change data part. I use only one gpu.

D-VVY commented 3 months ago

Test Performance are not as good as claimed For example: train on sta test on stb, claimed performace is: mae=11.4 and mse=19.7, but following the script yeilds mae=11.375 and mse=386.8510 key difference is the mse Is there any bug when test mse?

Hello, could you provide the details of your training environment? For example, the GPU model, memory, epochs, batch size, and most importantly, the random seed.

GPU is 2080, other environment settings are the same as this project.

Thank you for your response! May I ask if you only used the author's checkpoint for testing or if you retrained the model yourself before testing? I have run it a few times myself, but the metrics still differ from those in the paper, which is quite troubling!

The hardware is RTX3090, and hyperparameters should be the same as in the config file. It is awkward that the model is not fully deterministic even if the seed is fixed, and a crash of the cluster ruined all of my files so I cannot check the log. Sorry for the inconvenience.

I searched a little bit and the problem seems to be that F.interpolate is not deterministic. I didn't realize this in my implementation. In DG the results can differ greatly, especially in crowd counting where datasets are quite small.

Alright, thank you very much for your work!!!

D-VVY commented 3 months ago

I tested the offical models provided by author, it can produce results close to the claimed mae but much bigger mse. The author just explained why mse values are different in this page. I also reproduced the training part. The mae and mse are a little bit higher than the models provided by author. I didn't change a lot on the code, just change data part. I use only one gpu.

Alright, thank you for your response!

13732710091 commented 2 months ago

作者你好,我在复现你的代码时发现结果并不能达到论文中所写的那样。例如在测试sta->stb时,您的结果是MAE:11.4,MSE:19.7,但是我运行你的代码所得到的结果只能得到MAE:13多,MSE:25左右,在测试qnrf数据集时也是如此。我的使用的GPU是3090,其他超参数都没有变,还是用的你代码中所给出的。我想问问作者大大,我哪一步需要调整,希望能够得到作者大大的解答。

Shimmer93 commented 2 months ago

作者你好,我在复现你的代码时发现结果并不能达到论文中所写的那样。例如在测试sta->stb时,您的结果是MAE:11.4,MSE:19.7,但是我运行你的代码所得到的结果只能得到MAE:13多,MSE:25左右,在测试qnrf数据集时也是如此。我的使用的GPU是3090,其他超参数都没有变,还是用的你代码中所给出的。我想问问作者大大,我哪一步需要调整,希望能够得到作者大大的解答。

本楼有回答

susu-source commented 2 months ago

你好作者原文的batch_size设置为16,我的笔记本是4060显卡显存不够,我把batch_size调整为8训练出来的模型只能得到mae:14多。用你提供的模型可以达到论文水平。请问是因为我没有同步调整其他参数的原因吗?其他参数因该做什么调整呢需要更改num_epochs和lr吗?

Shimmer93 commented 2 months ago

你好作者原文的batch_size设置为16,我的笔记本是4060显卡显存不够,我把batch_size调整为8训练出来的模型只能得到mae:14多。用你提供的模型可以达到论文水平。请问是因为我没有同步调整其他参数的原因吗?其他参数因该做什么调整呢需要更改num_epochs和lr吗?

lr 要改成一半,steps_per_epoch 要改成两倍。即使更改后也可能会和论文水平有差距,一方面是 batch size有影响,另一方面是 code 中的失误导致 model 并非完全 deterministic,参见本楼上面的讨论。

AayuGupt commented 3 weeks ago

Hi, can anyone here provide me with the preprocessed dataset along with the density map (dmap_gen). for all three data sets sta,stb, qnrf ? Basically ready for training state. Actually my system is facing heating issues while running the algorithm and it is taking too much of time. This would be of a great help. Thanking you in advance.

twiQ1 commented 2 weeks ago

作者你好,我运行你的cls模型跑出来的sta_test_stb结果是mae:18.8,mse开根号后接近30。为什么mae与你论文上的14.4相差这么大呢?跑final模型出来的mae也是在13到14之间。这也和你论文中声称的11.4相差比较远。那我应该如何去弥补这个缺失的值?硬件是单块4090,其余超参数与你的yaml文件保持一致。如果是上采样具有不确定性,那么你的论文里面的指标是否应该考虑将最好的值换成平均值呢?

Shimmer93 commented 2 weeks ago

作者你好,我运行你的cls模型跑出来的sta_test_stb结果是mae:18.8,mse开根号后接近30。为什么mae与你论文上的14.4相差这么大呢?跑final模型出来的mae也是在13到14之间。这也和你论文中声称的11.4相差比较远。那我应该如何去弥补这个缺失的值?硬件是单块4090,其余超参数与你的yaml文件保持一致。如果是上采样具有不确定性,那么你的论文里面的指标是否应该考虑将最好的值换成平均值呢?

论文中的指标和相关工作保持一致选择了最佳值,有空之后我会修改模型并重新训练保证结果可以复现。

twiQ1 commented 2 weeks ago

好的,谢谢作者大大的工作。此外,我还有一个问题,运行结果的不稳定是否主要归结于分类PC插件呢?