Verg-Avesta / CounTR

CounTR: Transformer-based Generalised Visual Counting
https://verg-avesta.github.io/CounTR_Webpage/
MIT License
92 stars 9 forks source link

Loss scale factor #40

Closed GioFic95 closed 5 months ago

GioFic95 commented 8 months ago

Dear @Verg-Avesta, in your paper, you mention that "We scale the loss by a factor of 60". In fact, during training, the density map is first multiplied by 60 in the data preprocessing (https://github.com/Verg-Avesta/CounTR/blob/main/util/FSC147.py#L265), and then divided by 60 when computing MAE (https://github.com/Verg-Avesta/CounTR/blob/main/FSC_finetune_cross.py#L299).

Could you explain how you found this number? How should it be adapted when finetuning on another dataset?

Thank you very much.

Verg-Avesta commented 8 months ago

Well, this is just a handcraft trick.

At first, we found that the scale of the loss was too small that the model would tend to learn a trivial solution(an all-zero density map). Thus, we decided to multiply the density map by 100. But in case of 100, as the scale of the loss is too large, the model will tend to predict small values, even when the image has a lot of objects. This will lead to bad performance on FSC147, as it has several images with a large quantity of objects. Therefore, we decrease the scaling factor to 60, and it will alleviate this problem to a great extent.

When fine-tuning on another dataset, I suggest to choose this scaling factor based on the statistics of the dataset. If the images in the dataset tend to have lots of objects, a smaller scaling factor would be better. If there are always few objects in the images, you can try a larger scaling factor.