Closed 0x00b1 closed 7 years ago
Here’re a few links with information about implementing layers with multiple inputs:
https://github.com/fchollet/keras/issues/148 https://github.com/fchollet/keras/issues/2364 https://github.com/fchollet/keras/issues/3037
This issue has information about implementing a loss function for an intermediate layer.
@0x00b1 Why would the RPN need the ground truth as input? As far as I understand, RPN takes the Conv features as the input and predicts labels and bounding box transforms for K anchors in each cell. I guess what you mean here is something like a AnchorTargetLayer which produces anchor classification labels and bbox regression targets, given the bounding box ground truth?
Yeah, exactly. I imagined a layer that’d encapsulate AnchorLayer, AnchorTargetLayer, and ProposalLayer into one layer to circumvent the awkward train-and-predict step in Ross’ implementation. Unfortunately, it’s still unclear how this should be implemented. That’s why I’ve been implementing the Anchor and Proposal layers in parallel. What do you think?
I do think it's a good idea to make the train-to-predict switching easier. But I'm not sure whether encapsulating the Proposal Layer and the Anchor Target Layer into one is the way to go. It seems feeding the ground truth as inputs, instead of labels, to the model could be problematic during testing. I can hardly imagine what to fed as b
during test time.
How about keeping the anchor ground truth generation away from the model definition? What we want then becomes a loss function that calculates the losses given the "objectness" scores and the bounding boxes GT directly. We can put the Anchor Target Layer inside such a loss function.
@0x00b1 I am thinking of structuring it as:
x = keras.layers.Input((223, 223, 3))
a = keras_resnet.ResNet50(x)
[rpn_cls, rpn_reg] = keras_rcnn.layers.RegionProposalNetwork()(a)
rpn_pred = keras.backend.concatenate([rpn_cls, rpn_reg])
proposals = keras_rcnn.layers.ObjectProposal()([rpn_cls, rpn_reg])
[rcnn_cls, rcnn_reg] = keras_rcnn.layers.ROI([7, 7])([x, proposals])
model = Model( inputs=x, outputs=[rpn_pred, rcnn_cls, rcnn_reg])
model.compile( loss=[rpn_pred_loss, rcnn_cls_loss, rcnn_reg_loss], optimizer="adam")
And we have
def rpn_pred_loss(lambda, *args, **kwargs):
def f(y_true, y_pred):
# separate y_pred into rpn_cls_pred and rpn_reg_pred
rpn_cls_pred, rpn_reg_pred = separate_pred(y_pred)
# convert y_true from gt_boxes to gt_anchors
rpn_cls_gt, rpn_reg_gt = encode(y_true, rpn_cls_pred)
# classification loss
rpn_cls_loss = keras_rcnn.rpn.classification(anchors=9)(rpn_cls_gt, rpn_cls_pred)
# regression loss
rpn_reg_loss = keras_rcnn.rpn.regresion(anchors=9)(rpn_reg_gt, rpn_reg_pred)
return rpn_cls_loss + lambda * rpn_reg_loss
return f
Then we can also extend this to Mask R-CNN as simple as adding a mask branch and another loss rcnn_mask_loss
. What do you think?
@JihongJu I love this! Especially this:
x = keras.layers.Input((223, 223, 3))
a = keras_resnet.ResNet50(x)
[rpn_cls, rpn_reg] = keras_rcnn.layers.RegionProposalNetwork()(a)
rpn_pred = keras.backend.concatenate([rpn_cls, rpn_reg])
proposals = keras_rcnn.layers.ObjectProposal()([rpn_cls, rpn_reg])
[rcnn_cls, rcnn_reg] = keras_rcnn.layers.ROI([7, 7])([x, proposals])
model = Model( inputs=x, outputs=[rpn_pred, rcnn_cls, rcnn_reg])
model.compile( loss=[rpn_pred_loss, rcnn_cls_loss, rcnn_reg_loss], optimizer="adam")
@JihongJu I started structuring this into code:
classes = 2
x = keras.layers.Input((224, 224, 3))
y = keras_resnet.ResNet50(x)
rpn_classification = keras.layers.Conv2D(9 * 1, (1, 1), activation="sigmoid")(y.layers[-2].output)
rpn_regression = keras.layers.Conv2D(9 * 4, (1, 1))(y.layers[-2].output)
rpn_prediction = keras.layers.concatenate([rpn_classification, rpn_regression])
proposals = keras_rcnn.layers.object_detection.ObjectProposal(300)([rpn_classification, rpn_regression])
y = keras_rcnn.layers.ROI((7, 7) 32)([x, proposals])
y = kera.layers.AveragePooling2D((7, 7))(y)
y = keras.layers.Dense(4096)(y)
score = keras.layers.Dense(classes, activation="softmax")(y)
boxes = keras.layers.Dense(4 * (classes - 1))(y)
model = keras.models.Model(x, [rpn_prediction, score, boxes])
model.compile(optimizer="adam", loss="mse")
I started working on the loss function:
https://github.com/broadinstitute/keras-rcnn/blob/master/keras_rcnn/losses/rpn.py
https://github.com/broadinstitute/keras-rcnn/blob/master/tests/losses/test_rpn.py
@0x00b1 Cool. Maybe a typo here
model.compile(optimizer="adam", loss="mse") # Should be rpn/rcnn losses
I think that y_true and y_pred should be the same shape. Right now in the tests, classification has
y_pred = keras.backend.variable(0.5 * numpy.ones((1, 4, 4, n_anchors)))
y_true = keras.backend.variable(numpy.ones((1, 4, 4, 2 * n_anchors)))
and regression has
y_pred = keras.backend.variable(0.5 * numpy.ones((1, 4, 4, 4 * n_anchors)))
y_true = keras.backend.variable(numpy.ones((1, 4, 4, 8 * n_anchors)))
https://github.com/mitmul/chainer-faster-rcnn/blob/v2/models/region_proposal_network.py has output space
2 * n_anchors
for classification and
4 * n_anchors
for regression.
So
rpn_classification = keras.layers.Conv2D(9 * 2, (1, 1), activation="softmax")(y.layers[-2].output)
my edits:
classes = 2
x = keras.layers.Input((224, 224, 3))
y = keras_resnet.ResNet50(x, include_top=False)
rpn_classification = keras.layers.Conv2D(9 * 2, (1, 1), activation="softmax")(y)
rpn_regression = keras.layers.Conv2D(9 * 4, (1, 1))(y)
@jhung0 To answer you question about the y_true
shape, the first anchors
values indicate if the anchor is taken into account (1) or not (0). I think this implementation originally came from keras-frcnn, which I think is quite ugly. You could refer to the Anchor Layer for how the anchor target is generated for us.
@jhung0 Yes, indeed. and we already have some works done by @0x00b1 Anchor
@0x00b1 And we missed a pooling layer before the R-CNN C layers since ROI output fixed-size feature maps
y = keras_rcnn.layers.ROI((7, 7) 32)([x, proposals])
y = kera.layers.AveragePooling2D(pool_size=(7, 7))(y)
y = keras.layers.Dense(4096)(y)
@JihongJu I don't think we need that weird y_true
shape...? The losses seem to just depend on the values with the anchor taken into account like in https://github.com/rbgirshick/py-faster-rcnn/blob/master/models/pascal_voc/VGG16/faster_rcnn_end2end/train.prototxt#L465
@jhung0 For rpn_cls_score, I think what matters is whether we want to use the softmax loss or the logistic loss. Because rpn will have only one possible class, I don't see a particular reason why we should use softmax loss. What do you think @0x00b1
@jhung0 I agree with you. y_true should have a shape as simple as (anchors,). But we will have 0, 1 and -1 in it. Probably we need a loss can ignore -1s in y_true.
@0x00b1 And we missed a pooling layer before the R-CNN C layers since ROI output fixed-size feature maps
Nice catch. Updated my earlier comment! 😎
@jhung0 For rpn_cls_score, I think what matters is whether we want to use the softmax loss or the logistic loss. Because rpn will have only one possible class, I don't see a particular reason why we should use softmax loss. What do you think @0x00b1
Totally. We shouldn’t use softmax.
@jhung0 To answer you question about the y_true shape, the first anchors values indicate if the anchor is taken into account (1) or not (0). I think this implementation originally came from keras-frcnn, which I think is quite ugly. You could refer to the Anchor Layer for how the anchor target is generated for us.
@jhung0 Yeah, I dislike the keras-frcnn implementation too. The Anchor layer should have more or loss everything you need.
@0x00b1 Another thing we missed here is that these four layers
y = kera.layers.AveragePooling2D((7, 7))(y)
y = keras.layers.Dense(4096)(y)
score = keras.layers.Dense(classes, activation="softmax")(y)
boxes = keras.layers.Dense(4 * (classes - 1))(y)
should be applied per proposal. We will need the TimeDistributed layer from keras for this purpose.
@0x00b1 I modified the code above for the ResNet and added to broadinstitute/keras-rcnn#27.
Good night to everyone. I don't have the honor of being a contributor here and I'm not an expertise in keras programming, but I would like to suggest an idea about RPN "layer".
I saw into the file called "keras_rcnn/models.py" that the RPN was instantiated as a MODEL and not as a layer, as the original idea here in this amazing group. I know RPN is based on CNN, but I think RPN would be better if this block were instantiated as a layer to module this block. so, I propose this (of course, as I said I'm not a good programmer still, like the people working here): Thank you for reading this message.
class RPN(keras.engine.topology.Layer):
def __init__(self, anchors=9 , **kwargs):
self.anchors_cls = anchors * 1
self.anchors_reg = anchors * 4
super(RPN, self).__init__(**kwargs)
def build(self, input_shape):
self.channels = self.anchors_cls + self.anchors_reg
def call(self, inputs):
# y = inputs.layers[-2].output
y = inputs
a = keras.layers.Conv2D(self.anchors_cls, (1, 1), activation="sigmoid")(y)
b = keras.layers.Conv2D(self.anchors_reg, (1, 1))(y)
y = keras.layers.concatenate([a, b])
return y # [rpn_cls, rpn_reg]
def compute_output_shape(self, input_shape):
return None, input_shape[1], input_shape[2], self.channels # shape=(?, 500, 375, 45) for VOC2012
For non-concatenated output could be this (I think this is not an elegant programming, but it works in cases I tested such as concatenate again the outputs):
def call(self, inputs):
# y = inputs.layers[-2].output
y = inputs
a = keras.layers.Conv2D(self.anchors_cls, (1, 1), activation="sigmoid")(y)
b = keras.layers.Conv2D(self.anchors_reg, (1, 1))(y)
# y = keras.layers.concatenate([a, b])
return [a,b] # [rpn_cls, rpn_reg]
def compute_output_shape(self, input_shape):
out1 = None, input_shape[1], input_shape[2], self.anchors_cls,
out2 = None, input_shape[1], input_shape[2], self.anchors_reg
return [out1,out2]
def compute_mask(self, inputs, mask=None):
return 2 * [None]
The region proposal network (RPN) should take two inputs, image features (i.e. features extracted by ResNet) and ground truth bounding boxes and produce object proposals and corresponding “objectness” scores. I’m envisioning something like: