About hyper parameters - Githubissues

Verg-Avesta / CounTR

CounTR: Transformer-based Generalised Visual Counting

https://verg-avesta.github.io/CounTR_Webpage/

MIT License

92 stars 9 forks source link

About hyper parameters #27

Closed Jyerim closed 1 year ago

Jyerim commented 1 year ago

Are batch_size and learning rate the same as stated in the paper during Fine-tuning stage? In the script file, batch 26, lr=2e-4 is set to be learned. On the other hand, the paper states batch 8, lr=1e-5. This is the result of finetune with the provided FSC147.pth. Batch 26, lr=2e-4 : 12.79 mae/ 86.49 rmse Batch 8, lr=1e-5 : 13.77 mae/ 87.69 rmse

Verg-Avesta commented 1 year ago

The code has been updated by @GioFic95, so the batch_size and learning rate should also be updated to achieve a better result.

GioFic95 commented 1 year ago

The results reported in issue #26 are obtained by using the parameters described in the paper. I.e., I ran the finetuning with this command:

FSC_finetune_cross.py --epochs 1000 --batch_size 8 --lr 1e-5 --output_dir ./data/out/finetune --log_dir None --title CounTR_finetuning_paper --resume ./data/out/pretrain/checkpoint__pretraining_299.pth

Moreover, in my changes I haven't touched the default values for batch size, learning rate or gradient accumulation, and neither the training itself. I just added another logger alongside tensorboard: W&B. Thus, the hyperparameters shouldn't need any updates.

Verg-Avesta commented 1 year ago

Sorry for my inconsistencies of the hyperparameters.

I modified the finetuning code to conduct ablation studies and forgot to recover it completely. So for all hyperparameters except the learning rate, please refer to the original paper. For the learning rate, I used batch learning rate in the code and calculated the corresponding learning rate myself in the paper. So the batch learning rate in the code is correct.

For learning rate, I use a batch learning rate of 2e-4 as I mentioned in the readme.md. For batch size, I use 8, though I think a larger batch size will yield a better result.

Hope my description will help you.

Jyerim commented 1 year ago

Thank you for your answer.