broadinstitute / keras-rcnn

Keras package for region-based convolutional neural networks (RCNNs)
Other
552 stars 224 forks source link

nans during RPN training #200

Open sbordt opened 6 years ago

sbordt commented 6 years ago

Hi guys,

I tried to train a RPN using this great package but failed due to inf/nan loss, and now suspect there might be an issue with the Anchor. As far as I can tell, it does not generate enough positive examples during training, where every Anchor that has a IoU with a ground-truth bounding box of over 0.7 should be labelled as a positive example (as far as I understand Faster R-CNN).

I am not experienced with debugging keras, so I can't look into it right now, but I think it should be related to what's happend during _label at the Anchor.

Cheers,

Sebastian

0x00b1 commented 6 years ago

Hi, @sbordt! Thanks for the report. Are you running from source?

0x00b1 commented 6 years ago

As far as I can tell, it does not generate enough positive examples during training, where every Anchor that has a IoU with a ground-truth bounding box of over 0.7 should be labelled as a positive example (as far as I understand Faster R-CNN).

You can find the code here:

https://github.com/broadinstitute/keras-rcnn/blob/master/keras_rcnn/layers/object_detection/_anchor.py#L191-L197

sbordt commented 6 years ago

Thanks for the quick reply!

I understand that it happens somewhere within these lines, but I must admit that I don't fully understand the code.

However I have created a small example that should work with the current master to illustrate the issue

https://gist.github.com/sbordt/58cc34c29fce54ffb8f114f605ea9f37

Shouldn't there be more positive examples (in blue) versus ground-truth bounding boxes (in red)?

Ostnie commented 6 years ago

@sbordt Hi,I'm very sorry to bother you, but I really can't see how this library is used. I've been learning RCNN recently and hope to learn from a source code, but I can't see the structure of the library at all, even the questions you ask is strange for me. For example, where is the content of the RPN your are training ? I didn't find it in the repository.