Closed SiyaoNickHu closed 4 years ago
Notice the definition in the code:
self.xs_pos = torch.sigmoid(logits)
self.xs_neg = 1.0 - self.xs_pos
so basically: xs_pos=p xs_neg=1-p
this is done to prevent calculating again and again (1-p) along the code...
Oh I see. Nice trick!
Thanks for the quick response. I'll close the issue now.
Thanks for such an interesting paper 👍
In the paper's equation (4), asymmetric probability shifting is
p_m = max(p-m, 0)
, but in the implementation, it's called asymmetric clipping and there isxs_neg = (xs_neg + self.clip).clamp(max=1)
which is probablyp_m = min(p+m, 1)
.Is there a reason for this difference?