facebookresearch / detectron2

Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.
https://detectron2.readthedocs.io/en/latest/
Apache License 2.0
29.3k stars 7.32k forks source link

Fix DensePose vertex visualization. #5278

Open PieroV opened 1 month ago

PieroV commented 1 month ago

This commit fixes a RuntimeError by explicitly copying an index array to the CPU:

Traceback (most recent call last):
  File "/home/piero/tmp/detectron2/projects/DensePose/apply_net.py", line 353, in <module>
    main()
  File "/home/piero/tmp/detectron2/projects/DensePose/apply_net.py", line 349, in main
    args.func(args)
  File "/home/piero/tmp/detectron2/projects/DensePose/apply_net.py", line 105, in execute
    cls.execute_on_outputs(context, {"file_name": file_name, "image": img}, outputs)
  File "/home/piero/tmp/detectron2/projects/DensePose/apply_net.py", line 284, in execute_on_outputs
    image_vis = visualizer.visualize(image, data)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/piero/tmp/detectron2/projects/DensePose/densepose/vis/base.py", line 188, in visualize
    image = visualizer.visualize(image, data[i])
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/piero/tmp/detectron2/projects/DensePose/densepose/vis/densepose_outputs_vertex.py", line 93, in visualize
    vis = (embed_map[closest_vertices].clip(0, 1) * 255.0).cpu().numpy()
           ~~~~~~~~~^^^^^^^^^^^^^^^^^^
RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)

My command line was:

python apply_net.py show configs/cse/densepose_rcnn_R_50_FPN_s1x.yaml .../model_final_c4ea5f.pkl .../00000.jpg dp_vertex,bbox -v
PieroV commented 1 month ago

I think I tested this change with the GPU (I'm running an Nvidia GPU on my Linux machine), but I'm not sure, I'm really not an expert of Pytorch etc.

Do you have a suggestion on how can I force the script to run with a GPU? (Even though I think it was already, since the CPU load was low enough and the GPU was actually making noise).

Programmer-RD-AI commented 1 month ago

So you could use something like torch.cuda.is_avaialble() to check if the GPU or CPU is to be used, and use that accordingly, so changing that in the PR would be much more better imo... and Detectron2 by default run on the GPU if it is avaialbe, I have not run in a CPU thought... Best regards,

PieroV commented 1 month ago

Ok, I tried to revert my patch and added

print("Is CUDA available?", torch.cuda.is_available())

Result:

/home/piero/tmp/venv-cv/lib/python3.11/site-packages/torch/functional.py:507: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3549.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
[05/28 14:51:37 apply_net]: Processing /media/edati/kinect/bosca1/rgb-orig/rgb_00000.jpg
Is CUDA available? True
Traceback (most recent call last):
  File "/home/piero/tmp/detectron2/projects/DensePose/apply_net.py", line 353, in <module>
    main()
  File "/home/piero/tmp/detectron2/projects/DensePose/apply_net.py", line 349, in main
    args.func(args)
  File "/home/piero/tmp/detectron2/projects/DensePose/apply_net.py", line 105, in execute
    cls.execute_on_outputs(context, {"file_name": file_name, "image": img}, outputs)
  File "/home/piero/tmp/detectron2/projects/DensePose/apply_net.py", line 284, in execute_on_outputs
    image_vis = visualizer.visualize(image, data)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/piero/tmp/detectron2/projects/DensePose/densepose/vis/base.py", line 188, in visualize
    image = visualizer.visualize(image, data[i])
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/piero/tmp/detectron2/projects/DensePose/densepose/vis/densepose_outputs_vertex.py", line 94, in visualize
    vis = (embed_map[closest_vertices].clip(0, 1) * 255.0).cpu().numpy()
           ~~~~~~~~~^^^^^^^^^^^^^^^^^^
RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)

The script doesn't run without CUDA at all:

CUDA_VISIBLE_DEVICES="" python apply_net.py show configs/cse/densepose_rcnn_R_50_FPN_s1x.yaml https://dl.fbaipublicfiles.com/densepose/cse/densepose_rcnn_R_50_FPN_s1x/251155172/model_final_c4ea5f.pkl ../rgb_00000.jpg --output .../out.png dp_vertex,bbox -v
[05/28 14:52:44 apply_net]: Loading config from configs/cse/densepose_rcnn_R_50_FPN_s1x.yaml
[05/28 14:52:44 apply_net]: Loading model from https://dl.fbaipublicfiles.com/densepose/cse/densepose_rcnn_R_50_FPN_s1x/251155172/model_final_c4ea5f.pkl
Traceback (most recent call last):
  File "/home/piero/tmp/detectron2/projects/DensePose/apply_net.py", line 353, in <module>
    main()
  File "/home/piero/tmp/detectron2/projects/DensePose/apply_net.py", line 349, in main
    args.func(args)
  File "/home/piero/tmp/detectron2/projects/DensePose/apply_net.py", line 94, in execute
    predictor = DefaultPredictor(cfg)
                ^^^^^^^^^^^^^^^^^^^^^
  File "/home/piero/tmp/venv-cv/lib/python3.11/site-packages/detectron2-0.6-py3.11-linux-x86_64.egg/detectron2/engine/defaults.py", line 282, in __init__
    self.model = build_model(self.cfg)
                 ^^^^^^^^^^^^^^^^^^^^^
  File "/home/piero/tmp/venv-cv/lib/python3.11/site-packages/detectron2-0.6-py3.11-linux-x86_64.egg/detectron2/modeling/meta_arch/build.py", line 23, in build_model
    model.to(torch.device(cfg.MODEL.DEVICE))
  File "/home/piero/tmp/venv-cv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1152, in to
    return self._apply(convert)
           ^^^^^^^^^^^^^^^^^^^^
  File "/home/piero/tmp/venv-cv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 802, in _apply
    module._apply(fn)
  File "/home/piero/tmp/venv-cv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 802, in _apply
    module._apply(fn)
  File "/home/piero/tmp/venv-cv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 825, in _apply
    param_applied = fn(param)
                    ^^^^^^^^^
  File "/home/piero/tmp/venv-cv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1150, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/piero/tmp/venv-cv/lib/python3.11/site-packages/torch/cuda/__init__.py", line 302, in _lazy_init
    torch._C._cuda_init()
RuntimeError: No CUDA GPUs are available

Have you tried to run the command I wrote? In case, are you having problems in reproducing the error? Could it be due to a mismatch in Pytorch version?

Programmer-RD-AI commented 1 month ago

Hi, In the solution that you gave, the data is stored in the cpu() no matter if the GPU is available or not. The error originally was caused by the closest_vertices being in the CPU, so if you could convert it or transfer it to the GPU, it would have better performance (not sure exactly how much, but a significant amount). Best regards,

PieroV commented 1 month ago

it would have better performance (not sure exactly how much, but a significant amount)

Would it though? It seems to me this is only the visualization phase, it's fine if it's on CPU. Hasn't the inference already happened by now?

Programmer-RD-AI commented 1 month ago

By ensuring closest_vertices is on the same device as embed_map, you avoid the RuntimeError without compromising performance significantly. This approach provides a balanced solution, maintaining the efficiency of GPU operations during inference while ensuring compatibility and simplicity during visualization.

PieroV commented 1 month ago
print(embed_map.device)

--> cpu

PieroV commented 1 month ago

From what I can see, embed_map is already in the CPU and vis eventually is going to be in the CPU as well (there's a .cpu() after the clipping + multiplication). So, the .cpu() is redundant (after removing it the code still works), but maybe it could be moved before applying the mask (even if it was in the GPU, there's probably no real advantage in doing clipping and multiplication on the GPU).

I'm not going to do any refactors to make sure embed_map stays on the GPU in the callers of the failing method in this PR. If needed, I'll someone who has a better knowledge than me on how densepose works do it and I can open an issue instead (but in case I'd like the PR to be still merged, as the current status is non-working for me).

Programmer-RD-AI commented 1 month ago

@PieroV, Thank you for your detailed clarification regarding the device placement of embed_map and closest_vertices. We can check up on I'm not going to do any refactors to make sure embed_map stays on the GPU in the callers of the failing method in this PR. in a future PR. Great Contributions :)

Best regards, Ranuga Disansa