Open PieroV opened 1 month ago
I think I tested this change with the GPU (I'm running an Nvidia GPU on my Linux machine), but I'm not sure, I'm really not an expert of Pytorch etc.
Do you have a suggestion on how can I force the script to run with a GPU? (Even though I think it was already, since the CPU load was low enough and the GPU was actually making noise).
So you could use something like torch.cuda.is_avaialble()
to check if the GPU or CPU is to be used, and use that accordingly, so changing that in the PR would be much more better imo...
and Detectron2 by default run on the GPU if it is avaialbe, I have not run in a CPU thought...
Best regards,
Ok, I tried to revert my patch and added
print("Is CUDA available?", torch.cuda.is_available())
Result:
/home/piero/tmp/venv-cv/lib/python3.11/site-packages/torch/functional.py:507: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3549.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
[05/28 14:51:37 apply_net]: Processing /media/edati/kinect/bosca1/rgb-orig/rgb_00000.jpg
Is CUDA available? True
Traceback (most recent call last):
File "/home/piero/tmp/detectron2/projects/DensePose/apply_net.py", line 353, in <module>
main()
File "/home/piero/tmp/detectron2/projects/DensePose/apply_net.py", line 349, in main
args.func(args)
File "/home/piero/tmp/detectron2/projects/DensePose/apply_net.py", line 105, in execute
cls.execute_on_outputs(context, {"file_name": file_name, "image": img}, outputs)
File "/home/piero/tmp/detectron2/projects/DensePose/apply_net.py", line 284, in execute_on_outputs
image_vis = visualizer.visualize(image, data)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/piero/tmp/detectron2/projects/DensePose/densepose/vis/base.py", line 188, in visualize
image = visualizer.visualize(image, data[i])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/piero/tmp/detectron2/projects/DensePose/densepose/vis/densepose_outputs_vertex.py", line 94, in visualize
vis = (embed_map[closest_vertices].clip(0, 1) * 255.0).cpu().numpy()
~~~~~~~~~^^^^^^^^^^^^^^^^^^
RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)
The script doesn't run without CUDA at all:
CUDA_VISIBLE_DEVICES="" python apply_net.py show configs/cse/densepose_rcnn_R_50_FPN_s1x.yaml https://dl.fbaipublicfiles.com/densepose/cse/densepose_rcnn_R_50_FPN_s1x/251155172/model_final_c4ea5f.pkl ../rgb_00000.jpg --output .../out.png dp_vertex,bbox -v
[05/28 14:52:44 apply_net]: Loading config from configs/cse/densepose_rcnn_R_50_FPN_s1x.yaml
[05/28 14:52:44 apply_net]: Loading model from https://dl.fbaipublicfiles.com/densepose/cse/densepose_rcnn_R_50_FPN_s1x/251155172/model_final_c4ea5f.pkl
Traceback (most recent call last):
File "/home/piero/tmp/detectron2/projects/DensePose/apply_net.py", line 353, in <module>
main()
File "/home/piero/tmp/detectron2/projects/DensePose/apply_net.py", line 349, in main
args.func(args)
File "/home/piero/tmp/detectron2/projects/DensePose/apply_net.py", line 94, in execute
predictor = DefaultPredictor(cfg)
^^^^^^^^^^^^^^^^^^^^^
File "/home/piero/tmp/venv-cv/lib/python3.11/site-packages/detectron2-0.6-py3.11-linux-x86_64.egg/detectron2/engine/defaults.py", line 282, in __init__
self.model = build_model(self.cfg)
^^^^^^^^^^^^^^^^^^^^^
File "/home/piero/tmp/venv-cv/lib/python3.11/site-packages/detectron2-0.6-py3.11-linux-x86_64.egg/detectron2/modeling/meta_arch/build.py", line 23, in build_model
model.to(torch.device(cfg.MODEL.DEVICE))
File "/home/piero/tmp/venv-cv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1152, in to
return self._apply(convert)
^^^^^^^^^^^^^^^^^^^^
File "/home/piero/tmp/venv-cv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 802, in _apply
module._apply(fn)
File "/home/piero/tmp/venv-cv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 802, in _apply
module._apply(fn)
File "/home/piero/tmp/venv-cv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 825, in _apply
param_applied = fn(param)
^^^^^^^^^
File "/home/piero/tmp/venv-cv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1150, in convert
return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/piero/tmp/venv-cv/lib/python3.11/site-packages/torch/cuda/__init__.py", line 302, in _lazy_init
torch._C._cuda_init()
RuntimeError: No CUDA GPUs are available
Have you tried to run the command I wrote? In case, are you having problems in reproducing the error? Could it be due to a mismatch in Pytorch version?
Hi,
In the solution that you gave, the data is stored in the cpu() no matter if the GPU is available or not. The error originally was caused by the closest_vertices
being in the CPU, so if you could convert it or transfer it to the GPU, it would have better performance (not sure exactly how much, but a significant amount).
Best regards,
it would have better performance (not sure exactly how much, but a significant amount)
Would it though? It seems to me this is only the visualization phase, it's fine if it's on CPU. Hasn't the inference already happened by now?
By ensuring closest_vertices is on the same device as embed_map, you avoid the RuntimeError without compromising performance significantly. This approach provides a balanced solution, maintaining the efficiency of GPU operations during inference while ensuring compatibility and simplicity during visualization.
print(embed_map.device)
--> cpu
From what I can see, embed_map
is already in the CPU and vis
eventually is going to be in the CPU as well (there's a .cpu()
after the clipping + multiplication).
So, the .cpu()
is redundant (after removing it the code still works), but maybe it could be moved before applying the mask (even if it was in the GPU, there's probably no real advantage in doing clipping and multiplication on the GPU).
I'm not going to do any refactors to make sure embed_map
stays on the GPU in the callers of the failing method in this PR. If needed, I'll someone who has a better knowledge than me on how densepose works do it and I can open an issue instead (but in case I'd like the PR to be still merged, as the current status is non-working for me).
@PieroV,
Thank you for your detailed clarification regarding the device placement of embed_map
and closest_vertices
.
We can check up on I'm not going to do any refactors to make sure embed_map stays on the GPU in the callers of the failing method in this PR.
in a future PR.
Great Contributions :)
Best regards, Ranuga Disansa
This commit fixes a RuntimeError by explicitly copying an index array to the CPU:
My command line was: