Region proposal network (RPN) layer

0x00b1 commented 7 years ago

The region proposal network (RPN) should take two inputs, image features (i.e. features extracted by ResNet) and ground truth bounding boxes and produce object proposals and corresponding “objectness” scores. I’m envisioning something like:

x = keras.layers.Input((223, 223, 3))

a = keras_resnet.ResNet50(x)

b = keras.layers.Input((None, 4))

y = keras_rcnn.layers.RPN((14, 14))([a, b])

0x00b1 commented 7 years ago

Here’re a few links with information about implementing layers with multiple inputs:

https://github.com/fchollet/keras/issues/148 https://github.com/fchollet/keras/issues/2364 https://github.com/fchollet/keras/issues/3037

0x00b1 commented 7 years ago

This issue has information about implementing a loss function for an intermediate layer.

https://github.com/fchollet/keras/issues/5563

JihongJu commented 7 years ago

@0x00b1 Why would the RPN need the ground truth as input? As far as I understand, RPN takes the Conv features as the input and predicts labels and bounding box transforms for K anchors in each cell. I guess what you mean here is something like a AnchorTargetLayer which produces anchor classification labels and bbox regression targets, given the bounding box ground truth?

0x00b1 commented 7 years ago

Yeah, exactly. I imagined a layer that’d encapsulate AnchorLayer, AnchorTargetLayer, and ProposalLayer into one layer to circumvent the awkward train-and-predict step in Ross’ implementation. Unfortunately, it’s still unclear how this should be implemented. That’s why I’ve been implementing the Anchor and Proposal layers in parallel. What do you think?

JihongJu commented 7 years ago

I do think it's a good idea to make the train-to-predict switching easier. But I'm not sure whether encapsulating the Proposal Layer and the Anchor Target Layer into one is the way to go. It seems feeding the ground truth as inputs, instead of labels, to the model could be problematic during testing. I can hardly imagine what to fed as b during test time.

How about keeping the anchor ground truth generation away from the model definition? What we want then becomes a loss function that calculates the losses given the "objectness" scores and the bounding boxes GT directly. We can put the Anchor Target Layer inside such a loss function.

JihongJu commented 7 years ago

@0x00b1 I am thinking of structuring it as:

x = keras.layers.Input((223, 223, 3))
a = keras_resnet.ResNet50(x)
[rpn_cls, rpn_reg] = keras_rcnn.layers.RegionProposalNetwork()(a)
rpn_pred = keras.backend.concatenate([rpn_cls, rpn_reg])
proposals = keras_rcnn.layers.ObjectProposal()([rpn_cls, rpn_reg])
[rcnn_cls, rcnn_reg] = keras_rcnn.layers.ROI([7, 7])([x, proposals])
model = Model( inputs=x, outputs=[rpn_pred, rcnn_cls, rcnn_reg])
model.compile( loss=[rpn_pred_loss, rcnn_cls_loss, rcnn_reg_loss], optimizer="adam")

And we have

def rpn_pred_loss(lambda, *args, **kwargs):
    def f(y_true, y_pred):
        # separate y_pred into rpn_cls_pred and rpn_reg_pred
        rpn_cls_pred, rpn_reg_pred = separate_pred(y_pred)
        # convert y_true from gt_boxes to gt_anchors
        rpn_cls_gt, rpn_reg_gt = encode(y_true, rpn_cls_pred)
        # classification loss
        rpn_cls_loss = keras_rcnn.rpn.classification(anchors=9)(rpn_cls_gt, rpn_cls_pred)
        # regression loss
        rpn_reg_loss = keras_rcnn.rpn.regresion(anchors=9)(rpn_reg_gt, rpn_reg_pred)

        return rpn_cls_loss + lambda * rpn_reg_loss
    return f

Then we can also extend this to Mask R-CNN as simple as adding a mask branch and another loss rcnn_mask_loss. What do you think?

0x00b1 commented 7 years ago

@JihongJu I love this! Especially this:

x = keras.layers.Input((223, 223, 3))
a = keras_resnet.ResNet50(x)
[rpn_cls, rpn_reg] = keras_rcnn.layers.RegionProposalNetwork()(a)
rpn_pred = keras.backend.concatenate([rpn_cls, rpn_reg])
proposals = keras_rcnn.layers.ObjectProposal()([rpn_cls, rpn_reg])
[rcnn_cls, rcnn_reg] = keras_rcnn.layers.ROI([7, 7])([x, proposals])
model = Model( inputs=x, outputs=[rpn_pred, rcnn_cls, rcnn_reg])
model.compile( loss=[rpn_pred_loss, rcnn_cls_loss, rcnn_reg_loss], optimizer="adam")

0x00b1 commented 7 years ago

@JihongJu I started structuring this into code:

classes = 2

x = keras.layers.Input((224, 224, 3))

y = keras_resnet.ResNet50(x)

rpn_classification = keras.layers.Conv2D(9 * 1, (1, 1), activation="sigmoid")(y.layers[-2].output)

rpn_regression = keras.layers.Conv2D(9 * 4, (1, 1))(y.layers[-2].output)

rpn_prediction = keras.layers.concatenate([rpn_classification, rpn_regression])

proposals = keras_rcnn.layers.object_detection.ObjectProposal(300)([rpn_classification, rpn_regression])

y = keras_rcnn.layers.ROI((7, 7) 32)([x, proposals])
y = kera.layers.AveragePooling2D((7, 7))(y)
y = keras.layers.Dense(4096)(y)

score = keras.layers.Dense(classes, activation="softmax")(y)

boxes = keras.layers.Dense(4 * (classes - 1))(y)

model = keras.models.Model(x, [rpn_prediction, score, boxes])

model.compile(optimizer="adam", loss="mse")

jhung0 commented 7 years ago

I started working on the loss function:

https://github.com/broadinstitute/keras-rcnn/blob/master/keras_rcnn/losses/rpn.py

https://github.com/broadinstitute/keras-rcnn/blob/master/tests/losses/test_rpn.py

JihongJu commented 7 years ago

@0x00b1 Cool. Maybe a typo here

model.compile(optimizer="adam", loss="mse") # Should be rpn/rcnn losses

jhung0 commented 7 years ago

I think that y_true and y_pred should be the same shape. Right now in the tests, classification has

y_pred = keras.backend.variable(0.5 * numpy.ones((1, 4, 4, n_anchors)))
y_true = keras.backend.variable(numpy.ones((1, 4, 4, 2 * n_anchors)))

and regression has

y_pred = keras.backend.variable(0.5 * numpy.ones((1, 4, 4, 4 * n_anchors)))
y_true = keras.backend.variable(numpy.ones((1, 4, 4, 8 * n_anchors)))

jhung0 commented 7 years ago

https://github.com/mitmul/chainer-faster-rcnn/blob/v2/models/region_proposal_network.py has output space

2 * n_anchors

for classification and

4 * n_anchors

for regression.

So

rpn_classification = keras.layers.Conv2D(9 * 2, (1, 1), activation="softmax")(y.layers[-2].output)

jhung0 commented 7 years ago

my edits:

classes = 2

x = keras.layers.Input((224, 224, 3))

y = keras_resnet.ResNet50(x, include_top=False)

rpn_classification = keras.layers.Conv2D(9 * 2, (1, 1), activation="softmax")(y)

rpn_regression = keras.layers.Conv2D(9 * 4, (1, 1))(y)

jhung0 commented 7 years ago

And encode would be https://github.com/rbgirshick/py-faster-rcnn/blob/master/lib/rpn/anchor_target_layer.py#L65 ?

JihongJu commented 7 years ago

@jhung0 To answer you question about the y_true shape, the first anchors values indicate if the anchor is taken into account (1) or not (0). I think this implementation originally came from keras-frcnn, which I think is quite ugly. You could refer to the Anchor Layer for how the anchor target is generated for us.

JihongJu commented 7 years ago

@jhung0 Yes, indeed. and we already have some works done by @0x00b1 Anchor

JihongJu commented 7 years ago

@0x00b1 And we missed a pooling layer before the R-CNN C layers since ROI output fixed-size feature maps

y = keras_rcnn.layers.ROI((7, 7) 32)([x, proposals])
y = kera.layers.AveragePooling2D(pool_size=(7, 7))(y)
y = keras.layers.Dense(4096)(y)

jhung0 commented 7 years ago

@JihongJu I don't think we need that weird y_true shape...? The losses seem to just depend on the values with the anchor taken into account like in https://github.com/rbgirshick/py-faster-rcnn/blob/master/models/pascal_voc/VGG16/faster_rcnn_end2end/train.prototxt#L465

JihongJu commented 7 years ago

@jhung0 For rpn_cls_score, I think what matters is whether we want to use the softmax loss or the logistic loss. Because rpn will have only one possible class, I don't see a particular reason why we should use softmax loss. What do you think @0x00b1

JihongJu commented 7 years ago

@jhung0 I agree with you. y_true should have a shape as simple as (anchors,). But we will have 0, 1 and -1 in it. Probably we need a loss can ignore -1s in y_true.

0x00b1 commented 7 years ago

@0x00b1 And we missed a pooling layer before the R-CNN C layers since ROI output fixed-size feature maps

Nice catch. Updated my earlier comment! 😎

0x00b1 commented 7 years ago

@jhung0 For rpn_cls_score, I think what matters is whether we want to use the softmax loss or the logistic loss. Because rpn will have only one possible class, I don't see a particular reason why we should use softmax loss. What do you think @0x00b1

Totally. We shouldn’t use softmax.

0x00b1 commented 7 years ago

@jhung0 To answer you question about the y_true shape, the first anchors values indicate if the anchor is taken into account (1) or not (0). I think this implementation originally came from keras-frcnn, which I think is quite ugly. You could refer to the Anchor Layer for how the anchor target is generated for us.

@jhung0 Yeah, I dislike the keras-frcnn implementation too. The Anchor layer should have more or loss everything you need.

JihongJu commented 7 years ago

@0x00b1 Another thing we missed here is that these four layers

y = kera.layers.AveragePooling2D((7, 7))(y)
y = keras.layers.Dense(4096)(y)
score = keras.layers.Dense(classes, activation="softmax")(y)
boxes = keras.layers.Dense(4 * (classes - 1))(y)

should be applied per proposal. We will need the TimeDistributed layer from keras for this purpose.

JihongJu commented 7 years ago

@0x00b1 I modified the code above for the ResNet and added to broadinstitute/keras-rcnn#27.

emedinac commented 7 years ago

Good night to everyone. I don't have the honor of being a contributor here and I'm not an expertise in keras programming, but I would like to suggest an idea about RPN "layer".

I saw into the file called "keras_rcnn/models.py" that the RPN was instantiated as a MODEL and not as a layer, as the original idea here in this amazing group. I know RPN is based on CNN, but I think RPN would be better if this block were instantiated as a layer to module this block. so, I propose this (of course, as I said I'm not a good programmer still, like the people working here): Thank you for reading this message.

class RPN(keras.engine.topology.Layer):
    def __init__(self, anchors=9 , **kwargs):
        self.anchors_cls = anchors * 1
        self.anchors_reg = anchors * 4
        super(RPN, self).__init__(**kwargs)

    def build(self, input_shape):
        self.channels = self.anchors_cls + self.anchors_reg

    def call(self, inputs):
        # y = inputs.layers[-2].output
        y = inputs
        a = keras.layers.Conv2D(self.anchors_cls, (1, 1), activation="sigmoid")(y)
        b = keras.layers.Conv2D(self.anchors_reg, (1, 1))(y)

        y = keras.layers.concatenate([a, b]) 
        return y # [rpn_cls, rpn_reg]

    def compute_output_shape(self, input_shape):
        return None, input_shape[1], input_shape[2], self.channels  # shape=(?, 500, 375, 45) for VOC2012

For non-concatenated output could be this (I think this is not an elegant programming, but it works in cases I tested such as concatenate again the outputs):

    def call(self, inputs):
        # y = inputs.layers[-2].output
        y = inputs

        a = keras.layers.Conv2D(self.anchors_cls, (1, 1), activation="sigmoid")(y)
        b = keras.layers.Conv2D(self.anchors_reg, (1, 1))(y)

        # y = keras.layers.concatenate([a, b]) 
        return [a,b] # [rpn_cls, rpn_reg]

    def compute_output_shape(self, input_shape):
        out1 = None, input_shape[1], input_shape[2], self.anchors_cls, 
        out2 = None, input_shape[1], input_shape[2], self.anchors_reg
        return [out1,out2]
    def compute_mask(self, inputs, mask=None):
        return 2 * [None]

broadinstitute / keras-rcnn

Region proposal network (RPN) layer #7