why dont normalized the network outputs to (0-1)?

D-X-Y / landmark-detection

Four landmark detection algorithms, implemented in PyTorch.

https://xuanyidong.com/assets/projects/TPAMI-2020-SRT.html

MIT License

925 stars 180 forks source link

why dont normalized the network outputs to (0-1)? #76

Closed Dian-Yi closed 4 years ago

Dian-Yi commented 4 years ago

I recently study your codes, and feel very incomprehensible about the net outputs. Just like face detection, the net out(box coordination) will be normalized. why in the regression model or SBR model, the outputs is not normalization activation, because the values of real landmarks or heatmaps in (0-1).

D-X-Y commented 4 years ago

Could you please indicate which line of code that you are referring to?

Dian-Yi commented 4 years ago

My description may not be precise enough. for example, in SBR/lib/models/cpm_vgg16.py, you use the batch_cpms to calculate loss. why dont normalized batch_cpms in values (0-1). because the real heatmaps label‘s numerical range is (0-1). The regression model is the same. model's output is the final predict key position. why not normalized it, because model's output may be out of (0-1). It may make training network model harder.

Dian-Yi commented 4 years ago

in SRT/lib/models/ProCPM.pyline137: I find you use sigmoid(cpm) to normalized heatmaps. I'm very confused about whether to use it or not. I read other face landmarks git code, I found that regression models didn't use it. Isn't it important or not needed?

D-X-Y commented 4 years ago

For sigmoid, it is a hyperparameter (https://github.com/D-X-Y/landmark-detection/blob/master/SRT/lib/models/ProCPM.py#L137), and we did not use it in our experiments.

We do not normalize its value following (https://arxiv.org/pdf/1602.00134.pdf), and L2 loss with unnormalized prediction works well.

Dian-Yi commented 4 years ago

Thank you very much for your answering, can you explain why dont use 'sigmoid' prediction? Is it because it has no effect on the prediction results?

D-X-Y commented 4 years ago

My intuition is this is a regression problem that does not need to use sigmoid.