facebookresearch / detectron2

Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.
https://detectron2.readthedocs.io/en/latest/
Apache License 2.0
30.33k stars 7.46k forks source link

DensePose fine person segmentation #1419

Open Siset opened 4 years ago

Siset commented 4 years ago

❓ How can we use densepose/detectron2 in order to get only fine person segmentation (faster than just using the apply_net.py provided as example)

We are using an already trained model (densepose_rcnn_R_50_FPN_s1x.yaml). We would like to built on top of densepose but we require 5-6 FPS and we are only interested on the fine person segmentation. We are executing the apply_net.py with CUDA 10.2 in a GTX1080 Ti and the model execution takes ~2.2 seconds (the snipped code prints Time required: 2.2)

with torch.no_grad():
                a=time.time()
                outputs = predictor(img)["instances"]
                print("Time required: ",time.time()-a)
                cls.execute_on_outputs(context, {"file_name": file_name, "image": img}, outputs) #This does not really matter for us now

We would like to know if there is a possible way to speed up the "predictor(img)" by changing the configuration files (we use the default provided by the getting started guide [configs/densepose_rcnn_R_50_FPN_s1x.yaml]).

Thanks in advanced,

vkhalidov commented 4 years ago

Hello @Siset, currently there is no way to request a specific output from the predictor on the config level - it will always compute all four outputs (coarse segmentation, fine segmentation, U and V). So currently the only way would be to patch the predictor.

Siset commented 4 years ago

Hi @vkhalidov ,

Thanks for the answer and for the great work you and your team are doing in this repo! Really appreciate it.

How could we patch the predictor? Is there any tutorial/documentation on how to do so? I would come back with the solution in case of success.

Thanks in advanced,

vkhalidov commented 4 years ago

@Siset you'll need to patch DensePosePredictor::forward so that it doesn't compute ann_index_lowres, u_lowres, v_lowres and ann_index, u, v and set the corresponding variables to some stub values. Then you'll need to patch the logic in DensePoseOutput and DensePoseResult to be able to work with partial results.

Siset commented 4 years ago

Hi @vkhalidov ,

Thank you for the answer! I will try to do it during this week, I will come back with the approach.

(I close this issue and I will reopen it later with the feedback)

Thanks,

Siset commented 4 years ago

Hi @vkhalidov,

Thanks again for the input.

I have made different changes following your recommendations, and it runs without computing those values, the problem is that it requires the same amount of time than before. Is it possible? Or I must be doing something wrong? My concern is regarding this lines of code:

 with torch.no_grad():
                a=time.time()
                outputs = predictor(img)["instances"]
                print("time",time.time()-a) 

I have changed the forward to:

        ann_index_lowres = torch.tensor([[1., -1.], [1., -1.]])
        u_lowres = torch.tensor([[1., -1.], [1., -1.]])
        v_lowres = torch.tensor([[1., -1.], [1., -1.]])
        #print("ann_index_lowres",ann_index_lowres,ann_index_lowres.shape)
        #print("u_lowres",u_lowres,u_lowres.shape)
        #print("v_lowres",v_lowres,v_lowres.shape)
        #ann_index_lowres = self.ann_index_lowres(head_outputs)
        index_uv_lowres = self.index_uv_lowres(head_outputs)
        #u_lowres = self.u_lowres(head_outputs)
        #v_lowres = self.v_lowres(head_outputs)

        def interp2d(input):
            return interpolate(
                input, scale_factor=self.scale_factor, mode="bilinear", align_corners=False
            )

        ann_index = torch.tensor([[1., -1.], [1., -1.]])
        #ann_index = interp2d(ann_index_lowres)
        index_uv = interp2d(index_uv_lowres)
        u = torch.tensor([[1., -1.], [1., -1.]])
        #u = interp2d(u_lowres)
        v = torch.tensor([[1., -1.], [1., -1.]])
        #v = interp2d(v_lowres)
        (
            (sigma_1, sigma_2, kappa_u, kappa_v),
            (sigma_1_lowres, sigma_2_lowres, kappa_u_lowres, kappa_v_lowres),
            (ann_index, index_uv),
        ) = self._forward_confidence_estimation_layers(
            self.confidence_model_cfg, head_outputs, interp2d, ann_index, index_uv
        )
        return (
            (ann_index, index_uv, u, v),
            (ann_index_lowres, index_uv_lowres, u_lowres, v_lowres),
            (sigma_1, sigma_2, kappa_u, kappa_v),
            (sigma_1_lowres, sigma_2_lowres, kappa_u_lowres, kappa_v_lowres),
        )

The densepose_inference (also in the densepose_head.py) to :

    s, index_uv, u, v = densepose_outputs
    sigma_1, sigma_2, kappa_u, kappa_v = densepose_confidences
    k = 0
    for detection in detections:
        n_i = len(detection)
        #s_i = s[k : k + n_i]
        s_i = torch.tensor([[1., -1.], [1., -1.]])
        index_uv_i = index_uv[k : k + n_i]
        #u_i = u[k : k + n_i]
        u_i = torch.tensor([[1., -1.], [1., -1.]])
        #v_i = v[k : k + n_i]
        v_i = torch.tensor([[1., -1.], [1., -1.]])
        _local_vars = locals()
        confidences = {
            name: _local_vars[name]
            for name in ("sigma_1", "sigma_2", "kappa_u", "kappa_v")
            if _local_vars.get(name) is not None
        }
        densepose_output_i = DensePoseOutput(s_i, index_uv_i, u_i, v_i, confidences)
        detection.pred_densepose = densepose_output_i
        k += n_i

I have changed different asserts in the structure and so on.