I am trying to do person detection using faster rcnn and I want to make it a bit more fast by soft skipping images. Following is my idea for doing it.
Add a classification subnetwork which takes in conv5 features and outputs a probability of presence of a person in image (standard binary object classification).
If prob<threshold:feature_map = 0*feature_mapelse feature_map = feature_map
`
Now, I feed the feature map to RPN and standard faster RCNN pipeline continues.
My hypothesis is that it will make network faster over a dataset which has lots of images without person.
Following are my assumptions.
If the feature_map == 0, then convolution layers in RPN will be much faster since I belive multiplication by zeros is optimized in pytorch/python.
Objectness score for all anchors coresponding to zero feature map will be very very small (not necessarily zero due to bias terms) and will therefore be suppressed even before applying NMS to RPN output.
Since NMS is expensive (please correct me if I am wrong), it will be relatively faster when no anchror from a particular image make it to post-RPN NMS.
Given the above assumptions hold, we will have far fewer ROIs (over the dataset) and thus ROI pooling and subsequent object classifier stage will also take less time.
Please comment on the state of correctness of my assumptions. I don't have much idea about how GPU internally parallelizes NMS and ROI-pooling. Plus, it would be great if you can indicate any other implicit assumptions that I might be making along the way.
NOTE:
I call it soft skipping because I am not directly skipping processing. Rather, the zero feature map of an image automatically let it be skipped.
I am asking these questions with respect to single GPU processing (K20 at my end).
❓ Questions and Help
I am trying to do person detection using faster rcnn and I want to make it a bit more fast by soft skipping images. Following is my idea for doing it.
Add a classification subnetwork which takes in conv5 features and outputs a probability of presence of a person in image (standard binary object classification).
If prob<threshold:
feature_map = 0*feature_map
else feature_map = feature_map
` Now, I feed the feature map to RPN and standard faster RCNN pipeline continues.My hypothesis is that it will make network faster over a dataset which has lots of images without person.
Following are my assumptions.
If the feature_map == 0, then convolution layers in RPN will be much faster since I belive multiplication by zeros is optimized in pytorch/python.
Objectness score for all anchors coresponding to zero feature map will be very very small (not necessarily zero due to bias terms) and will therefore be suppressed even before applying NMS to RPN output.
Since NMS is expensive (please correct me if I am wrong), it will be relatively faster when no anchror from a particular image make it to post-RPN NMS.
Given the above assumptions hold, we will have far fewer ROIs (over the dataset) and thus ROI pooling and subsequent object classifier stage will also take less time.
Please comment on the state of correctness of my assumptions. I don't have much idea about how GPU internally parallelizes NMS and ROI-pooling. Plus, it would be great if you can indicate any other implicit assumptions that I might be making along the way.
NOTE:
I call it soft skipping because I am not directly skipping processing. Rather, the zero feature map of an image automatically let it be skipped.
I am asking these questions with respect to single GPU processing (K20 at my end).