airsplay / py-bottom-up-attention

PyTorch bottom-up attention with Detectron2
Apache License 2.0
229 stars 56 forks source link

Why are the feature dimensions different? #20

Open chenlin038 opened 3 years ago

chenlin038 commented 3 years ago

Questions like: I use the following code to extract features based on the FPN model. The config is: "COCO-InstanceSegmentation/mask_rcnn_R_101_FPN_3x.yaml"

Feature extraction code: ''' img_path = "input.jpg" img_ori = cv2.imread(img_path) height, width = img_ori.shape[:2] img = predictor.transform_gen.get_transform(img_ori).apply_image(img_ori) img = torch.as_tensor(img.astype("float32").transpose(2, 0, 1)) inputs = [{"image": img, "height": height, "width": width}] with torch.no_grad(): imglists = predictor.model.preprocessimage(inputs) # don't forget to preprocess features = predictor.model.backbone(imglists.tensor) # set of cnn features proposals, = predictor.model.proposal_generator(imglists, features, None) # RPN

proposal_boxes = [x.proposal_boxes for x in proposals]
features_list = [features[f] for f in predictor.model.roi_heads.in_features]
proposal_rois = predictor.model.roi_heads.box_pooler(features_list,  proposal_boxes)
**box_features** = predictor.model.roi_heads.box_head(proposal_rois) 

'''

I use _boxfeatures as the feature of the object detection. But its dimension is 1024, which is inconsistent with the original bottom-up-attention image feature dimension in 2048. They all use residual-101 as the backbone network, so why are the feature dimensions inconsistent?

I apologized if the answer is obvious, I am very new to object detection.

Thank you!

airsplay commented 3 years ago

The Github version is converted from the original bottom-up attention repo. It takes a specific detector trained with Visual Genome. Please take the configurations under the folder py-bottom-up-attention/configs/VG-Detection/. Hope these help!