facebookresearch / detectron2

Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.
https://detectron2.readthedocs.io/en/latest/
Apache License 2.0
30.69k stars 7.51k forks source link

Batched inference on images using DensePose? #2117

Open RSKothari opened 4 years ago

RSKothari commented 4 years ago

❓ How to do something using detectron2

Currently, DensePose reads in single images and infer dense annotations. This is very slow and quite wasteful. Does DensePose have the ability to read in batches of images to perform inference?

Describe what you want to do, including:

  1. what inputs you will provide, if any: A video filled with images

  2. what outputs you are expecting: A pickle file with dense pose annotations, except inferred a lot faster.

❓ What does an API do and how to use it?

Please link to which API or documentation you're asking about from https://detectron2.readthedocs.io/

NOTE:

  1. Only general answers are provided. If you want to ask about "why X did not work", please use the Unexpected behaviors issue template.

  2. About how to implement new models / new dataloader / new training logic, etc., check documentation first.

  3. We do not answer general machine learning / computer vision questions that are not specific to detectron2, such as how a model works, how to improve your training/make it converge, or what algorithm/methods can be used to achieve X.

MathijsNL commented 4 years ago

Hi there,

This might be a duplicate of Duplicate of #282 I haven't used DensePose myself, but I suppose the usage should be the same as described in the issue

You just need to call model with a batch of inputs

Also there is #1986 which explains how to sort images as well before doing inference. You should be able to work it out with this info, let us know if there is anything unclear.

RSKothari commented 4 years ago

@MathijsNL Thanks, my question however is specific to the DensePose module within Detectron2. It seems it reads in one image after the other to perform inference.

vkhalidov commented 4 years ago

yes, currently DensePose doesn't provide an efficient reader that would batch video inputs. I've got a pending PR to torchvision that addresses this issue.

mdsrLab commented 3 months ago

For batched input inference, you can make the following change to apply_net.py(InferenceAction class):-

@classmethod
    def execute(cls: type, args: argparse.Namespace):
        batch_size = 16
        logger.info(f"Loading config from {args.cfg}")
        opts = []
        cfg = cls.setup_config(args.cfg, args.model, args, opts)
        logger.info(f"Loading model from {args.model}")
        predictor = DefaultPredictor(cfg)
        logger.info(f"Loading data from {args.input}")
        file_list = cls._get_input_file_list(args.input)
        if len(file_list) == 0:
            logger.warning(f"No input images for {args.input}")
            return
        context = cls.create_context(args, cfg)

        for file_batch_ind in range(math.ceil(len(file_list)/batch_size)):
            img_list = []
            for batch_ind in range(batch_size):
                if((batch_size*file_batch_ind + batch_ind) >= len(file_list)):
                    break
                img = read_image(file_list[batch_size*file_batch_ind + batch_ind], format="BGR")  # predictor expects BGR image.
                img_list.append(img)
            with torch.no_grad():
                outputs = predictor(img_list)
                for batch_ind in range(batch_size):
                    if((batch_size*file_batch_ind + batch_ind) >= len(file_list)):
                        break
                    cls.execute_on_outputs(context, {"file_name": file_list[batch_size*file_batch_ind + batch_ind], "image": img_list[batch_ind]}, 
                                           outputs[batch_ind]["instances"])
        cls.postexecute(context)

You would also need to change the call function of the DefaultPredictor class in detectron2/engine/defaults.py

def __call__(self, original_image_list):
        """
        Args:
            original_image (np.ndarray): an image of shape (H, W, C) (in BGR order).

        Returns:
            predictions (dict):
                the output of the model for one image only.
                See :doc:`/tutorials/models` for details about the format.
        """
        with torch.no_grad():  # https://github.com/sphinx-doc/sphinx/issues/4258
            # Apply pre-processing to image.
            if self.input_format == "RGB":
                # whether the model expects BGR inputs or RGB
                original_image = original_image[:, :, ::-1]
            inputList = []
            for original_image in original_image_list:
                height, width = original_image.shape[:2]
                image = self.aug.get_transform(original_image).apply_image(original_image)
                image = torch.as_tensor(image.astype("float32").transpose(2, 0, 1))
                image.to(self.cfg.MODEL.DEVICE)

                inputs = {"image": image, "height": height, "width": width}
                inputList.append(inputs)

            predictions = self.model(inputList)
            return predictions

I have modified the predictor function to take in a list of images and dump the results in the same format as the sequential image processing.

matejsuchanek commented 2 months ago

5330 is dealing with this.