CasiaFan / tensorflow_retinanet

RetinaNet with Focal Loss implemented by Tensorflow
121 stars 45 forks source link

focal loss code #5

Open NoNamesLeft4Me opened 6 years ago

NoNamesLeft4Me commented 6 years ago

Hi, I have two questions about focal_loss() in loss.py

  1. why do you use tf.nn.sigmoid, instead of tf.nn.softmax? In the comment it says "Compute softmax focal loss..." Is this a bug or you do it on purpose?

  2. It seems precise_logits is not used. Shouldn't be predictions = tf.nn.sigmoid(precise_logits)

CasiaFan commented 6 years ago

@NoNamesLeft4Me In the beginning, I just use softmax activation intuitively, but in classification subnet section of focal loss paper:

Finally sigmoid activations are attached to output the KA binary predictions per spatial location

So I use sigmoid activation instead, but forget to modify the comment 😅 ... Sorry for leaving confusion.

As for the second one, you are right it's a typo. It should be precise_logits, not logits. Thanks for your report!

NoNamesLeft4Me commented 6 years ago

@CasiaFan Thanks for your clarifying on the first issue. I am curious about the choice of sigmoid or softmax. In the focus loss paper the author claims to use sigmoid for greater numerical stability, though the original FPN paper I believe uses softmax, if they follow faster-rcnn design. I notice online ppl discuss that the softmax would have the numerical stability issue if not implemented in lower TF level (c++ OP), I am wondering if this is why focus loss author chose sigmoid over softmax. Since you have tried softmax, do you have any thoughts on this matter?

CasiaFan commented 6 years ago

@NoNamesLeft4Me

The classification subnet predicts the probability of object presence at each spatial position for each of the A anchors and K object classes.

AFAIK, since it's just a binary classification in each anchor for one class, using sigmoid activation is more straightforward. As for the stability you mentioned, I'm not sure your explanation is the reason why the authors choose sigmoid for they should use caffe c++ interface rather than tf. But at least in my test case, both two choices work normally.

NoNamesLeft4Me commented 6 years ago

@CasiaFan Thanks for your clarifying. I will try both ways. Another issue I want to bring to you is that tf.equal(onehot_labels, 1) and tf.equal(onehot_labels, 1.0) might not be robust. I use TF 1.3 and py 2.x on Linux and I found out the comparison does not work if I don't cast onehot_labels into integer. I only work on integer onehot labels, so I cast them into int to bypass this problem. Not sure if you encountered this issue, so just let you know.

CasiaFan commented 6 years ago

@NoNamesLeft4Me Thanks for your reminder. tf.one_hot() returns tensor with float type in default. I test both situations and it seems that comparing a integer tensor with float number will cause error, like tf.equal(tf.cast(tf.one_hot([0, 1], 2), dtype=tf.int64), 1.0); but no issue in reverse. BTW, my case is under TF1.4 and python2.7 under ubuntu16.04