[Reproduce] Cannot reproduce the results with base MAE

Hi @Verg-Avesta, I tried to reproduce your pre-training + fine-tuning process, but my results are still different if I use the base MAE model mae_vit_base_patch16, even using the pretrained weights mentioned in issue #6, and even after the fixes suggested in issue #23: I get MAE 13.95 and RMSE 90.25. On the other hand, if I use the large MAE model mae_vit_large_patch16 I obtain MAE 12.58 and RMSE 87.25, which are closer to the results discussed in the aforementioned issue (MAE: 12.44, RMSE: 89.86), but this isn't mentioned anywhere, as far as I know.

What lets me think that this may be the reason of the difference, besides the fact that the other parameters seems to be the same indicated in the paper or in readme/issues, is the observation that the size of the fine-tuned weights you uploaded on drive (FSC147.pth) is 1.2GB, while the size of my fine-tuned model is ~500MB, as already noticed in issue #7, as far as I can understand using Google Translate.

Other combinations may work as well, e.g. base MAE for pre-training and large MAE for fine-tuning, but I haven't still tried it.

Here are the parameters I used, in case I missed something

for pre-training: ```python { "lr": { "desc": null, "value": 0.000005 }, "blr": { "desc": null, "value": 0.001 }, "seed": { "desc": null, "value": 0 }, "team": { "desc": null, "value": "wsense" }, "model": { "desc": null, "value": "mae_vit_base_patch16" }, "title": { "desc": null, "value": "CounTR_pretraining_paper" }, "wandb": { "desc": null, "value": "counting" }, "_wandb": { "desc": null, "value": { "t": { "1": [ 1, 41, 55, 63 ], "2": [ 1, 41, 55, 63 ], "3": [ 2, 13, 15, 16, 23 ], "4": "3.9.15", "5": "0.13.9", "8": [ 5 ] }, "framework": "torch", "start_time": 1674927257.097797, "cli_version": "0.13.9", "is_jupyter_run": false, "python_version": "3.9.15", "is_kaggle_kernel": false } }, "device": { "desc": null, "value": "cuda" }, "epochs": { "desc": null, "value": 300 }, "gt_dir": { "desc": null, "value": "gt_density_map_adaptive_384_VarV2" }, "im_dir": { "desc": null, "value": "images_384_VarV2" }, "min_lr": { "desc": null, "value": 0 }, "resume": { "desc": null, "value": "./weights/mae_pretrain_vit_base_full.pth" }, "log_dir": { "desc": null, "value": "None" }, "pin_mem": { "desc": null, "value": true }, "dist_url": { "desc": null, "value": "env://" }, "wandb_id": { "desc": null, "value": null }, "anno_file": { "desc": null, "value": "annotation_FSC147_384.json" }, "data_path": { "desc": null, "value": "./data/FSC147/" }, "accum_iter": { "desc": null, "value": 1 }, "batch_size": { "desc": null, "value": 16 }, "local_rank": { "desc": null, "value": -1 }, "mask_ratio": { "desc": null, "value": 0.5 }, "output_dir": { "desc": null, "value": "./data/out/pretrain" }, "world_size": { "desc": null, "value": 1 }, "dist_on_itp": { "desc": null, "value": false }, "distributed": { "desc": null, "value": false }, "num_workers": { "desc": null, "value": 10 }, "start_epoch": { "desc": null, "value": 0 }, "weight_decay": { "desc": null, "value": 0.05 }, "norm_pix_loss": { "desc": null, "value": false }, "warmup_epochs": { "desc": null, "value": 10 }, "data_split_file": { "desc": null, "value": "Train_Test_Val_FSC_147.json" } } ``` and fine-tuning: ```python { "lr": { "desc": null, "value": 0.00001 }, "blr": { "desc": null, "value": 0.001 }, "seed": { "desc": null, "value": 0 }, "team": { "desc": null, "value": "wsense" }, "model": { "desc": null, "value": "mae_vit_base_patch16" }, "title": { "desc": null, "value": "CounTR_finetuning_paper" }, "wandb": { "desc": null, "value": "counting" }, "_wandb": { "desc": null, "value": { "t": { "1": [ 1, 41, 55, 63 ], "2": [ 1, 41, 55, 63 ], "3": [ 2, 13, 15, 16, 23 ], "4": "3.9.15", "5": "0.13.9", "8": [ 5 ] }, "framework": "torch", "start_time": 1674944766.966494, "cli_version": "0.13.9", "is_jupyter_run": false, "python_version": "3.9.15", "is_kaggle_kernel": false } }, "device": { "desc": null, "value": "cuda" }, "epochs": { "desc": null, "value": 1000 }, "gt_dir": { "desc": null, "value": "gt_density_map_adaptive_384_VarV2" }, "im_dir": { "desc": null, "value": "images_384_VarV2" }, "min_lr": { "desc": null, "value": 0 }, "resume": { "desc": null, "value": "./data/out/pretrain/checkpoint__pretraining_299.pth" }, "log_dir": { "desc": null, "value": "None" }, "pin_mem": { "desc": null, "value": true }, "dist_url": { "desc": null, "value": "env://" }, "wandb_id": { "desc": null, "value": null }, "anno_file": { "desc": null, "value": "annotation_FSC147_384.json" }, "data_path": { "desc": null, "value": "./data/FSC147/" }, "accum_iter": { "desc": null, "value": 1 }, "batch_size": { "desc": null, "value": 8 }, "class_file": { "desc": null, "value": "ImageClasses_FSC147.txt" }, "local_rank": { "desc": null, "value": -1 }, "mask_ratio": { "desc": null, "value": 0.5 }, "output_dir": { "desc": null, "value": "./data/out/finetune" }, "world_size": { "desc": null, "value": 1 }, "dist_on_itp": { "desc": null, "value": false }, "distributed": { "desc": null, "value": false }, "num_workers": { "desc": null, "value": 10 }, "start_epoch": { "desc": null, "value": 0 }, "weight_decay": { "desc": null, "value": 0.05 }, "norm_pix_loss": { "desc": null, "value": false }, "warmup_epochs": { "desc": null, "value": 10 }, "data_split_file": { "desc": null, "value": "Train_Test_Val_FSC_147.json" } } ```

Does it sound reasonable? Maybe you run a fine-tuning with the large MAE? Thanks in advance

Verg-Avesta / CounTR

[Reproduce] Cannot reproduce the results with base MAE #26