Verg-Avesta / CounTR

CounTR: Transformer-based Generalised Visual Counting
https://verg-avesta.github.io/CounTR_Webpage/
MIT License
92 stars 9 forks source link

[Reproduce] Cannot reproduce the results of the pretrain stage. #6

Closed Xu3XiWang closed 1 year ago

Xu3XiWang commented 1 year ago

Hello. Thank you for your great work.

I used the pre-trained model you provided and tried the second stage parameters provided in the readme.md and got similar results to those in the paper.

However, when I used the first stage parameters provided in the document, started training on the first stage using FSC-147, and then trained on the second stage with the parameters provided in the document, I ended up with the following results. MAE: 23.76, RMSE: 105.93

I think I have a problem with the first stage training. Is the first stage pre-training model in the paper obtained by using the parameters in the readme.md, and also is there anything else that needs to be modified?

Verg-Avesta commented 1 year ago

Thanks for your attention.

I might not have clearly understood your meaning. Do you mean that you fine-tuned the pre-trained checkpoints provided in this issue and get the similar results, but you pre-trained a model from scratch and then fine-tuned it but get a bad result?

If that's the problem, I might have an answer. In the MAE pre-training stage, I didn't pre-train the model from scartch, instead, I used the checkpoints pre-trained on ImageNet by MAE in this issue. You can try whether this will give you a similar result.

Besides, I strongly recommend you to visualize the model performance pre-trained with MAE (i.e. after the first stage) and check whether the model has learned to reconstruct the masked images.

Xu3XiWang commented 1 year ago

Yes, that is what I mean. Thanks a lot!