Closed 5uperpalo closed 2 years ago
QUESTION 1 : in the focal_mse loss...
This is a good catch. Since there's already a hyper-parameter beta
to control the "scaling" effect, we did not use the squared error -- but one could always try that (and tune the value of beta
correspondingly).
QUESTION 2 : why is there 2*torch.abs(...)-1
This is because sigmoid(x)
is in [0.5, 1] for all x>=0
, so we simple scale it back to [0, 1]. If you choose activation
as 'tanh', you do not need to rescale it. Since it's an engineering issue, for cleanness, we did not expand the detail in paper.
thank you! now it's clear to me :)
Hi authors,
page 6 from your paper: Precisely, Focal-R loss based on L1 distance can be written as 1/n∑n i=1 σ(|βei|)γ ei, where ei is the L1 error for i-th sample, σ(·)
2*torch.abs(...)-1
? you do not have and -1 or 2* in the function in your paper?