Closed sohamghosh121 closed 5 years ago
Sorry, I realised the original paper is slightly weird - it outputs 2 channels for each mask (I thought softmax over pixels), and takes softmax across channels. So this is not exactly equivalent, but probably better.
It's actually strictly equivalent. the softmax of a 2 channel tensor is actually the sigmoid of the difference between the two channels.
Besides, BCE is the cross entropy when you only have 2 channels. Since in that context, only knowing one chanel is enough to know everything, you only need to feed it one channel.
So this is intentionnal, it doesn't work better, but it's certainly much more readable that way.
In the original repo, and the paper models the explainability mask as a softmax https://github.com/tinghuiz/SfMLearner/blob/2a387b763bc2b6f95b095f929bf751797c9db68a/SfMLearner.py#L83 https://github.com/tinghuiz/SfMLearner/blob/2a387b763bc2b6f95b095f929bf751797c9db68a/SfMLearner.py#L151
Whereas you have sigmoids, and BCE https://github.com/ClementPinard/SfmLearner-Pytorch/blob/0f741cb06caa3fbc95f487187697e54d2e03c190/models/PoseExpNet.py#L83
Is this intentional? DId sigmoid work better?