Pilhyeon / WTAL-Uncertainty-Modeling

Official Pytorch Implementation of 'Weakly-supervised Temporal Action Localization by Uncertainty Modeling' (AAAI-21)
MIT License
123 stars 10 forks source link

Why choose softmax as the activation function instead of sigmoid? #23

Closed yangjiangeyjg closed 3 years ago

yangjiangeyjg commented 3 years ago

It is a multi-label classification problem. @Pilhyeon

Pilhyeon commented 3 years ago

Hi, thanks for your interest. In fact, we follow the convention where the softmax function is preferred to sigmoid for video-level classification. This is, I conjecture because the cross-entropy with softmax is easier to optimize than the binary cross-entropy with sigmoid when considering the small dataset size (e.g., 200 videos for THUMOS'14). In addition, the softmax function is enough to get accurate video-level predictions.

yangjiangeyjg commented 3 years ago

Hi, thanks for your interest. In fact, we follow the convention where the softmax function is preferred to sigmoid for video-level classification. This is, I conjecture because the cross-entropy with softmax is easier to optimize than the binary cross-entropy with sigmoid when considering the small dataset size (e.g., 200 videos for THUMOS'14). In addition, the softmax function is enough to get accurate video-level predictions.

Thanks!