Verg-Avesta / CounTR

CounTR: Transformer-based Generalised Visual Counting
https://verg-avesta.github.io/CounTR_Webpage/
MIT License
94 stars 10 forks source link

[Reproduce] Cannot reproduce the results with base MAE #26

Closed GioFic95 closed 1 year ago

GioFic95 commented 1 year ago

Hi @Verg-Avesta, I tried to reproduce your pre-training + fine-tuning process, but my results are still different if I use the base MAE model mae_vit_base_patch16, even using the pretrained weights mentioned in issue #6, and even after the fixes suggested in issue #23: I get MAE 13.95 and RMSE 90.25. On the other hand, if I use the large MAE model mae_vit_large_patch16 I obtain MAE 12.58 and RMSE 87.25, which are closer to the results discussed in the aforementioned issue (MAE: 12.44, RMSE: 89.86), but this isn't mentioned anywhere, as far as I know.

What lets me think that this may be the reason of the difference, besides the fact that the other parameters seems to be the same indicated in the paper or in readme/issues, is the observation that the size of the fine-tuned weights you uploaded on drive (FSC147.pth) is 1.2GB, while the size of my fine-tuned model is ~500MB, as already noticed in issue #7, as far as I can understand using Google Translate.

Other combinations may work as well, e.g. base MAE for pre-training and large MAE for fine-tuning, but I haven't still tried it.

Here are the parameters I used, in case I missed something for pre-training: ```python { "lr": { "desc": null, "value": 0.000005 }, "blr": { "desc": null, "value": 0.001 }, "seed": { "desc": null, "value": 0 }, "team": { "desc": null, "value": "wsense" }, "model": { "desc": null, "value": "mae_vit_base_patch16" }, "title": { "desc": null, "value": "CounTR_pretraining_paper" }, "wandb": { "desc": null, "value": "counting" }, "_wandb": { "desc": null, "value": { "t": { "1": [ 1, 41, 55, 63 ], "2": [ 1, 41, 55, 63 ], "3": [ 2, 13, 15, 16, 23 ], "4": "3.9.15", "5": "0.13.9", "8": [ 5 ] }, "framework": "torch", "start_time": 1674927257.097797, "cli_version": "0.13.9", "is_jupyter_run": false, "python_version": "3.9.15", "is_kaggle_kernel": false } }, "device": { "desc": null, "value": "cuda" }, "epochs": { "desc": null, "value": 300 }, "gt_dir": { "desc": null, "value": "gt_density_map_adaptive_384_VarV2" }, "im_dir": { "desc": null, "value": "images_384_VarV2" }, "min_lr": { "desc": null, "value": 0 }, "resume": { "desc": null, "value": "./weights/mae_pretrain_vit_base_full.pth" }, "log_dir": { "desc": null, "value": "None" }, "pin_mem": { "desc": null, "value": true }, "dist_url": { "desc": null, "value": "env://" }, "wandb_id": { "desc": null, "value": null }, "anno_file": { "desc": null, "value": "annotation_FSC147_384.json" }, "data_path": { "desc": null, "value": "./data/FSC147/" }, "accum_iter": { "desc": null, "value": 1 }, "batch_size": { "desc": null, "value": 16 }, "local_rank": { "desc": null, "value": -1 }, "mask_ratio": { "desc": null, "value": 0.5 }, "output_dir": { "desc": null, "value": "./data/out/pretrain" }, "world_size": { "desc": null, "value": 1 }, "dist_on_itp": { "desc": null, "value": false }, "distributed": { "desc": null, "value": false }, "num_workers": { "desc": null, "value": 10 }, "start_epoch": { "desc": null, "value": 0 }, "weight_decay": { "desc": null, "value": 0.05 }, "norm_pix_loss": { "desc": null, "value": false }, "warmup_epochs": { "desc": null, "value": 10 }, "data_split_file": { "desc": null, "value": "Train_Test_Val_FSC_147.json" } } ``` and fine-tuning: ```python { "lr": { "desc": null, "value": 0.00001 }, "blr": { "desc": null, "value": 0.001 }, "seed": { "desc": null, "value": 0 }, "team": { "desc": null, "value": "wsense" }, "model": { "desc": null, "value": "mae_vit_base_patch16" }, "title": { "desc": null, "value": "CounTR_finetuning_paper" }, "wandb": { "desc": null, "value": "counting" }, "_wandb": { "desc": null, "value": { "t": { "1": [ 1, 41, 55, 63 ], "2": [ 1, 41, 55, 63 ], "3": [ 2, 13, 15, 16, 23 ], "4": "3.9.15", "5": "0.13.9", "8": [ 5 ] }, "framework": "torch", "start_time": 1674944766.966494, "cli_version": "0.13.9", "is_jupyter_run": false, "python_version": "3.9.15", "is_kaggle_kernel": false } }, "device": { "desc": null, "value": "cuda" }, "epochs": { "desc": null, "value": 1000 }, "gt_dir": { "desc": null, "value": "gt_density_map_adaptive_384_VarV2" }, "im_dir": { "desc": null, "value": "images_384_VarV2" }, "min_lr": { "desc": null, "value": 0 }, "resume": { "desc": null, "value": "./data/out/pretrain/checkpoint__pretraining_299.pth" }, "log_dir": { "desc": null, "value": "None" }, "pin_mem": { "desc": null, "value": true }, "dist_url": { "desc": null, "value": "env://" }, "wandb_id": { "desc": null, "value": null }, "anno_file": { "desc": null, "value": "annotation_FSC147_384.json" }, "data_path": { "desc": null, "value": "./data/FSC147/" }, "accum_iter": { "desc": null, "value": 1 }, "batch_size": { "desc": null, "value": 8 }, "class_file": { "desc": null, "value": "ImageClasses_FSC147.txt" }, "local_rank": { "desc": null, "value": -1 }, "mask_ratio": { "desc": null, "value": 0.5 }, "output_dir": { "desc": null, "value": "./data/out/finetune" }, "world_size": { "desc": null, "value": 1 }, "dist_on_itp": { "desc": null, "value": false }, "distributed": { "desc": null, "value": false }, "num_workers": { "desc": null, "value": 10 }, "start_epoch": { "desc": null, "value": 0 }, "weight_decay": { "desc": null, "value": 0.05 }, "norm_pix_loss": { "desc": null, "value": false }, "warmup_epochs": { "desc": null, "value": 10 }, "data_split_file": { "desc": null, "value": "Train_Test_Val_FSC_147.json" } } ```

Does it sound reasonable? Maybe you run a fine-tuning with the large MAE? Thanks in advance

Verg-Avesta commented 1 year ago

Hello, I am also confused why the checkpoints everyone gets are smaller than the ones I provided. I didn't use mae_vit_large_patch16 in finetuning(at least I didn't know if I used it). And according to issue #7, he output the value.size() and value.dtype of the model part in his checkpoint and mine, and found that they are the same. Therefore, I guess it might be due to the different versions of some libraries that the parameters such as optimizer are not saved. You can try to output and compare the parameters in the two checkpoints, and check whether the two are consistent.

For the results, the results in the paper are the best results I got, and I think a MAE between 12~13 is OK for mae_vit_base_patch16. The reproducing result of issue #7 is MAE: 13.89, RMSE: 82.74 and he ran my checkpoints and got a result of MAE: 12.44, RMSE: 89.86. So the results are not very different.

Hope my description can help you find the problem.