facebookresearch / vilbert-multi-task

Multi Task Vision and Language
MIT License
797 stars 180 forks source link

Use the "extract_features" script with Detectron2? #38

Open kevkid opened 4 years ago

kevkid commented 4 years ago

NOTE: I have made some changes towards the bottom if someone can take a look at it to let me know if it looks about right?

I have been trying to get this to work with detectron2 as I have a fine tuned model on some custom data.

In particular I do not know how to implement this portion using detectron2

def _process_feature_extraction(
        self, output, im_scales, im_infos, feature_name="fc6", conf_thresh=0
    ):
        batch_size = len(output[0]["proposals"])
        n_boxes_per_image = [len(boxes) for boxes in output[0]["proposals"]]
        score_list = output[0]["scores"].split(n_boxes_per_image)
        score_list = [torch.nn.functional.softmax(x, -1) for x in score_list]
        feats = output[0][feature_name].split(n_boxes_per_image)
        cur_device = score_list[0].device

I have tried implementing part of it but I am stuck on what the scores are? What does it represent? is it the full softmax vector?

this is what I have done so far:

images = ImageList.from_tensors(lst[:1], size_divisibility=32).to("cuda")  # preprocessed input tensor
#setup config
cfg = get_cfg()
cfg.merge_from_file(model_zoo.get_config_file("COCO-InstanceSegmentation/mask_rcnn_R_101_FPN_3x.yaml"))
cfg.MODEL.WEIGHTS = os.path.join(cfg.OUTPUT_DIR, "model_final.pth")
cfg.SOLVER.IMS_PER_BATCH = 1
cfg.MODEL.ROI_HEADS.NUM_CLASSES = 1  # only has one class (pnumonia)
#Just run these lines if you have the trained model im memory
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.7   # set the testing threshold for this model
#build model
model = build_model(cfg)
DetectionCheckpointer(model).load("output/model_final.pth")
model.eval()#make sure its in eval mode

#run model
features = model.backbone(images.tensor.float())
proposals, _ = model.proposal_generator(images, features)
instances = model.roi_heads._forward_box(features, proposals)
mask_features = [features[f] for f in model.roi_heads.in_features]
mask_features = model.roi_heads.mask_pooler(mask_features, [x.pred_boxes for x in instances])
###########
batch_size = len(proposals)
n_boxes_per_image = [len(boxes) for boxes in proposals]

EDIT

I have changed the extract features methods to run using the detectron2 model. I believe this is correct, could anyone take a quick look at it: _process_feature_extraction:

    def _process_feature_extraction(
        self, output, im_scales, im_infos, feature_name="p6", conf_thresh=0
    ):
        feat_list = []
        info_list = []
        batch_size = len(output['instances'])
        #print(batch_size)
        for i in range(batch_size):
            feat_list.append(output['features'][feature_name][i])
            info_list.append(
                    {
                        "bbox": output['instances'][i].pred_boxes.to('cpu').tensor.numpy() / im_scales[i],
                        "num_boxes": len(output['instances'][i]),
                        "objects": output['instances'][i].pred_classes.to('cpu').numpy(),
                        "image_width": im_infos[i]["width"],
                        "image_height": im_infos[i]["height"],
                        "cls_prob": output['instances'][i].scores.to('cpu').numpy(),
                    }
                )

        return feat_list, info_list

get_detectron_features:

    def get_detectron_features(self, image_paths):
        img_tensor, im_scales, im_infos = [], [], []

        for image_path in image_paths:
            im, im_scale, im_info = self._image_transform(image_path)
            img_tensor.append(im)
            im_scales.append(im_scale)
            im_infos.append(im_info)

        # Image dimensions should be divisible by 32, to allow convolutions
        # in detector to work
        current_img_list = ImageList.from_tensors(img_tensor, size_divisibility=32)
        current_img_list = current_img_list.to("cuda")
        #print(current_img_list.tensor)
        #print(np.shape(current_img_list.tensor))

        with torch.no_grad():
            #run model
            features = self.detection_model.backbone(current_img_list.tensor)#outputs features
            proposals, _ = self.detection_model.proposal_generator(current_img_list, features)
            instances, scores = self.detection_model.roi_heads._forward_box(features, proposals)
            mask_features = [features[f] for f in self.detection_model.roi_heads.in_features]
            mask_features = self.detection_model.roi_heads.mask_pooler(mask_features, [x.pred_boxes for x in instances])
            output = {'features':features, 'proposals':proposals, 'instances':instances, 'mask_features': mask_features}

        feat_list = self._process_feature_extraction(
            output,
            im_scales,
            im_infos,
            self.feature_name,
            self.confidence_threshold,
        )

        return feat_list

_build_detection_model:

    def _build_detection_model(self):
        cfg = get_cfg()
        cfg.merge_from_file(model_zoo.get_config_file("COCO-InstanceSegmentation/mask_rcnn_R_101_FPN_3x.yaml"))
        #cfg.MODEL.WEIGHTS = os.path.join(cfg.OUTPUT_DIR, "model_final.pth")
        cfg.SOLVER.IMS_PER_BATCH = 1
        cfg.MODEL.ROI_HEADS.NUM_CLASSES = 1  # only has one class (pnumonia)
        #Just run these lines if you have the trained model im memory
        cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.7   # set the testing threshold for this model
        model = build_model(cfg)
        DetectionCheckpointer(model).load("output/model_final.pth")
        cfg.freeze()

        model.to("cuda")
        model.eval()
        return model
amogh112 commented 4 years ago

Hi @kevkid , were you able to make this run? I am looking at this option to be able to get features in the same format without having to setup maskrcnn_benchmark on a different CUDA version machine.

kevkid commented 4 years ago

Hi @kevkid , were you able to make this run? I am looking at this option to be able to get features in the same format without having to setup maskrcnn_benchmark on a different CUDA version machine.

This is what I did:

import traceback
DEBUG = False

class feature_extractor:
    '''
    Feature Extractor for detectron2
    '''
    def __init__(self, path = None, output_folder='./output', model = None, pred_thresh = 0.5):
        self.pred_thresh = pred_thresh
        self.output_folder = output_folder
        assert path is not None, 'Path should not be none'
        self.path = path
        if model == None:
            self.model = self._build_detection_model()
        else:
            assert model == detectron2.engine.defaults.DefaultPredictor, "model should be 'detectron2.engine.defaults.DefaultPredictor'"#
            self.model = model
            self.model.eval()
    def _build_detection_model(self):
        cfg = get_cfg()
        cfg.merge_from_file(model_zoo.get_config_file("COCO-InstanceSegmentation/mask_rcnn_R_101_FPN_3x.yaml"))
        cfg.MODEL.WEIGHTS = os.path.join(cfg.OUTPUT_DIR, "model_final.pth")
        cfg.SOLVER.IMS_PER_BATCH = 1
        cfg.MODEL.ROI_HEADS.NUM_CLASSES = 1  # only has one class (pnumonia)
        #Just run these lines if you have the trained model im memory
        cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = self.pred_thresh   # set the testing threshold for this model
        #build model and return
        return DefaultPredictor(cfg)
    def _process_feature_extraction(self, inputs):#step 3
        '''
        #predictor.model.roi_heads.box_predictor.test_topk_per_image = 1000
        #predictor.model.roi_heads.box_predictor.test_nms_thresh = 0.99
        #predictor.model.roi_heads.box_predictor.test_score_thresh = 0.0
        #pred_boxes = [x.pred_boxes for x in instances]#can use prediction boxes
        '''
        torch.cuda.empty_cache()
        predictor = self.model
        with torch.no_grad():#https://detectron2.readthedocs.io/_modules/detectron2/modeling/roi_heads/roi_heads.html : _forward_box()
            images = predictor.model.preprocess_image(inputs)  # don't forget to preprocess, this is done in another step
            features = predictor.model.backbone(images.tensor)  # set of cnn features
            proposals, _ = predictor.model.proposal_generator(images, features, None)  # RPN

            features_ = [features[f] for f in predictor.model.roi_heads.box_in_features]
            box_features = predictor.model.roi_heads.box_pooler(features_, [x.proposal_boxes for x in proposals])
            box_features = predictor.model.roi_heads.box_head(box_features)  # features of all 1k candidates
            predictions = predictor.model.roi_heads.box_predictor(box_features)#found here: https://detectron2.readthedocs.io/_modules/detectron2/modeling/roi_heads/roi_heads.html            
            '''First, it applies box deltas to readjust the proposal boxes. Then it computes Non-Maximum Suppression 
            to remove non-overlapping boxes (while also applying other hyper-settings such as score threshold). 
            Finally, it ranks top-k boxes according to their scores. #something to do with: NMS threshold for prediction results. 
            found: https://github.com/facebookresearch/detectron2/blob/master/detectron2/modeling/roi_heads/fast_rcnn.py#L460'''
            pred_instances, pred_inds = predictor.model.roi_heads.box_predictor.inference(predictions, proposals)
            pred_instances = predictor.model.roi_heads.forward_with_given_boxes(features, pred_instances)

            # output boxes, masks, scores, etc
            pred_instances = predictor.model._postprocess(pred_instances, inputs, images.image_sizes)  # scale box to orig size
            # features of the proposed boxes
            feats = box_features[pred_inds].to('cpu')
            #['bbox', 'num_boxes', 'objects', 'cls_prob', 'image_id', 'features']
            #img.image_sizes[0]#h, w
            instances = pred_instances[0]['instances'].to('cpu')#send to cpu
            num_instances = len(instances)
            assert num_instances > 0, 'No detected features!'
            result = {
                'bbox': instances.pred_boxes.tensor.numpy(),
                'num_boxes' : num_instances,#len(pred_instances[0]['instances'].pred_boxes[pred_inds].tensor.cpu().numpy()),
                'objects' : instances.pred_classes.numpy(),#pred_instances[0]['instances'].pred_classes.cpu().numpy(),
                'cls_prob': instances.scores.numpy(),#pred_instances[0]['instances'].scores.cpu().numpy(),
                'features': feats.to('cpu').numpy()
            }
        return result

    def _save_feature(self, file_name, feature, info):
        file_base_name = os.path.basename(file_name)
        file_base_name = file_base_name.split(".")[0]
        feature["image_id"] = file_base_name
        feature['image_height'] = info['height']
        feature['image_width'] = info['width']
        file_base_name = file_base_name + ".npy"
        np.save(os.path.join(self.output_folder, file_base_name), feature)

    def extract_features(self):#step 1
        #torch.cuda.empty_cache()
        image_dir = self.path
        #print(image_dir)
        if type(image_dir) == pd.core.frame.DataFrame:#or pandas.core.frame.DataFrame. Iterate over a dataframe
            samples = []
            for idx, row in image_dir.iterrows():#get better name
                file = row['path']
                try:
                    features, infos = self.get_detectron2_features([file])
                    self._save_feature(file, features, infos[0])
                    samples.append(row)
                except Exception as e:#if no features were found!
                    print('An exception has occurred: {}'.format(e))
                    if DEBUG:
                        traceback.print_exc()
                    continue
            df = pd.DataFrame(samples)
            #save final csv containing image base names, reports and report locations
            df.to_csv(os.path.join(self.output_folder, 'img_infos.csv'))
        elif os.path.isfile(image_dir):#if its a single file
            features, infos = self.get_detectron2_features([image_dir])
            self._save_feature(image_dir, features[0], infos[0])
            return features, infos
        else:#if its a directory
            files = glob.glob(os.path.join(image_dir, "*"))
            for idx, file in enumerate(files):
                try:
                    features, infos = self.get_detectron2_features([file])
                    self._save_feature(file, features, infos[0])
                except Exception as e:#if no features were found!
                    print('An exception has occurred: {}'.format(e))
                    if DEBUG:
                        traceback.print_exc()
                    continue

    def get_detectron2_features(self, image_paths):#step 2
        #we have to PREPROCESS the tensor before partially executing it!
        #taken from https://github.com/facebookresearch/detectron2/blob/master/detectron2/engine/defaults.py
        predictor = self.model
        images = []
        image_info = []
        for image_path in image_paths:
            img = cv2.imread(image_path)
            height, width = img.shape[:2]
            img = predictor.transform_gen.get_transform(img).apply_image(img)
            img = torch.as_tensor(img.astype("float32").transpose(2, 0, 1))
            images.append({"image": img, "height": height, "width": width})
            image_info.append({"image_id": os.path.basename(image_path), "height": height, "width": width})
        #returns features and infos
        return self._process_feature_extraction(images), image_info
Yudezhi commented 3 years ago

mark for more details

enaserianhanzaei commented 3 years ago

@Yudezhi @kevkid @amogh112 Hi guys

I wrote a step-by-step tutorial on how to set up the environment, train and test this model. I also added a section on extracting the visiolinguistic embeddings from the image-text data. https://naserian-elahe.medium.com/vilbert-a-model-for-learning-joint-representations-of-image-content-and-natural-language-47f56a313a79 I very much appreciate any comments or suggestions

TopCoder2K commented 3 years ago

@Yudezhi @kevkid @amogh112 Hi guys

I wrote a step-by-step tutorial on how to set up the environment, train and test this model. I also added a section on extracting the visiolinguistic embeddings from the image-text data. https://naserian-elahe.medium.com/vilbert-a-model-for-learning-joint-representations-of-image-content-and-natural-language-47f56a313a79 I very much appreciate any comments or suggestions

@enaserianhanzaei, thank you for the article! I looked through Setting-up the environment section but still have questions... 1) Did the installation of the required dependencies finish successfully? I have the following error:

Collecting python-prctl
  Downloading python-prctl-1.8.1.tar.gz (28 kB)
WARNING: Discarding https://files.pythonhosted.org/packages/c0/99/be5393cfe9c16376b4f515d90a68b11f1840143ac1890e9008bc176cf6a6/python-prctl-1.8.1.tar.gz#sha256=b4ca9a25a7d4f1ace4fffd1f3a2e64ef5208fe05f929f3edd5e27081ca7e67ce (from https://pypi.org/simple/python-prctl/). Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.
  Downloading python-prctl-1.8.tar.gz (27 kB)
WARNING: Discarding https://files.pythonhosted.org/packages/0c/09/a03aa84131f7f699fc5e71ef8f9004adfe12d231d7b3e41e6864948fcc32/python-prctl-1.8.tar.gz#sha256=e73f74da9c104b69a690141fe41e67339297da91932460c00a78e8536b2caa61 (from https://pypi.org/simple/python-prctl/). Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.
  Downloading python-prctl-1.7.tar.gz (24 kB)
WARNING: Discarding https://files.pythonhosted.org/packages/7a/90/61935e2530a76f41e9e4f8ba0fe073d4ad0a3e16c4953156253f939fb057/python-prctl-1.7.tar.gz#sha256=57ebd556616d6ffe1f794f514680e84a03737cb070de37722198d7ad6c8f4fda (from https://pypi.org/simple/python-prctl/). Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.
  Downloading python-prctl-1.6.1.tar.gz (24 kB)
WARNING: Discarding https://files.pythonhosted.org/packages/2c/a6/a866caf122908583f5f5e27217ca7f956c616e48e35cdb2a4a60ab4c7ad8/python-prctl-1.6.1.tar.gz#sha256=c421350bfe64cb8dd05d7a5b657317e2e45daad573e6e2f0af3e7ca459768d9e (from https://pypi.org/simple/python-prctl/). Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.
  Downloading python-prctl-1.6.0.tar.gz (24 kB)
WARNING: Discarding https://files.pythonhosted.org/packages/09/73/4874f8a927657378d7a20cbdf21692ee08fd641e97e7e401c3a0dca8f2f8/python-prctl-1.6.0.tar.gz#sha256=c7d4a290c0f2ec30b65051864c230063cd806d5d95239dd60b721109d1b9dc75 (from https://pypi.org/simple/python-prctl/). Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.
  Downloading python-prctl-1.5.0.tar.gz (23 kB)
WARNING: Discarding https://files.pythonhosted.org/packages/e5/4b/3b22c7066e9ba49d0d22677b81d64e7a145b30637ab56976ac7c26301c38/python-prctl-1.5.0.tar.gz#sha256=57335a54a7d657c1407448305a56a13cead840802c743fcd915bc382f9ce4fff (from https://pypi.org/simple/python-prctl/). Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.
ERROR: Could not find a version that satisfies the requirement python-prctl
ERROR: No matching distribution found for python-prctl

2) Did installation of apex finish successfully? I had to add extra export CUDA_HOME=/usr/local/cuda-10.0/ in order to make it work...

3) Also conda install pytorch-nightly -c pytorch failed because conda couldn't find the package. I tried conda install pytorch=1.0 -c pytorch-nightly but it demanded special versions of Python in which my version was not included (I have Python 3.9). If I change the Python version when creating the environment, It seems I have the bug with torch version (see below).

4) Do we have to make 2 environments: the first is vilbert-mt and the second is maskrcnn_benchmark? If I run extract_features.py under the second, I get ModuleNotFoundError: No module named 'cv2'. If I run under the first, I get the same... I tried to cheat and use pip install opencv-python under the vilbert-mt, but I got ModuleNotFoundError: No module named 'maskrcnn_benchmark'. When I did that under the second, I got ImportError: /root/vilbert-multi-task/maskrcnn-benchmark/maskrcnn_benchmark/_C.cpython-36m-x86_64-linux-gnu.so: undefined symbol: _ZN3c105ErrorC1ENS_14SourceLocationERKSs :(((((((

And I haven't even started to use model yet........