Closed Huxwell closed 3 weeks ago
@Huxwell I see, it really is weird. It hasn't been long since I ran a keypoint onnx with onnxruntime-gpu. The error mentioned above is due to the code in the onnx package, you can try using one of the following
I do not remember much but I think I got success with aten_fallback.
No luck so far, with ONNX_ATEN_FALLBACK not changing anything for me and ONNX_ATEN resulting in export crash
aten::unsqueeze(Tensor(a) self, int dim) -> Tensor(a):
Expected a value of type 'Tensor' for argument 'self' but instead found type 'List[Tensor]'.
Empty lists default to List[Tensor]. Add a variable annotation to the assignment to create an empty list of another type (torch.jit.annotate(List[T, []]) where T is the type of elements in the list for Python 2)
I will read comment section of my '/Apr2024Detectron_venv/lib/python3.8/site-packages/torch/onnx/utils.py', run the unit tests from detectron2 for export and analyze the logs and hope I will find some clue there.
In the meantime, do you happen to have a converted .onnx file from vanilla keypoints detector, that you could share? It would allow me to verify if my problem is with the exporter (Loops in keypoints related code) or rather with my onnxruntime usage/version.
Just to clarify: ONNX_FALLTHROUGH successfully generates an onnx file, but onnxruntime crushes when reading such file with
Type parameter (T) of Optype (SequenceConstruct) bound to different types (tensor(int64) and tensor(float) in node (SequenceConstruct_2862)
Managed to successfully bump STABLE_ONNX_OPSET_VERSION from 11 to 16 and 17 (eliminating warning about RoIAlign). The petty issue is I was changing the version in cloned version of detectron2, rather than the one installed by pip in a venv.
I have mocked keypoints computation and succeeded in running the model in onnxruntime. First I though the problem lies in detectron2/modeling/roi_heads/keypoint_head.py keypoint_rcnn_inference() , now I am certain it's in the heatmaps_to_keypoints() it calls https://github.com/facebookresearch/detectron2/blob/main/detectron2/structures/keypoints.py It seems I must rewrite a loop inside heatmaps_to_keypoints()
for i in range(num_rois):
outsize = (int(heights_ceil[i]), int(widths_ceil[i]))
roi_map = F.interpolate(maps[i].unsqueeze(0), size=outsize, mode="bicubic", align_corners=False).squeeze(0)
max_score, _ = roi_map.view(num_keypoints, -1).max(1)
max_score = max_score.view(num_keypoints, 1, 1)
roi_map = roi_map - max_score # Normalize heatmap for stability
exp_map = roi_map.exp_()
score = exp_map / exp_map.view(num_keypoints, -1).sum(1).view(num_keypoints, 1, 1)
pos = exp_map.view(num_keypoints, -1).argmax(1)
x_int = pos % outsize[1]
y_int = pos // outsize[1]
x = (x_int.float() + 0.5) * width_corrections[i]
y = (y_int.float() + 0.5) * height_corrections[i]
xy_preds[i, :, 0] = x + offset_x[i]
xy_preds[i, :, 1] = y + offset_y[i]
xy_preds[i, :, 2] = roi_map.view(num_keypoints, -1).gather(1, pos.unsqueeze(1)).squeeze(1)
xy_preds[i, :, 3] = score.view(num_keypoints, -1).gather(1, pos.unsqueeze(1)).squeeze(1)
so it's onnx compatible, working on that.
@Huxwell Glad to hear you could solve it! The amount of version dependencies between the Detectron2 and TensorRT is insane.
I tried to run Detectron2 as it is on my Ubuntu 22.04, it will run with cpu but won't run with cuda haha.
Let me know if something else happens.
Ok, now I am able to run correctly in onnxruntime, with reasonable predictions (even with custom models, using custom number of keypoints instead of 17, r18/r34 backbone instead of r50, using my weights rather than pretrained etc). The only changes are in the aforementioned loop (apart from ONNX->ONNX_FALLTHROUGH and STABLE_ONNX_OPSET_VERSION 11->16): https://github.com/facebookresearch/detectron2/blob/main/detectron2/structures/keypoints.py
@torch.jit.script_if_tracing
def heatmaps_to_keypoints(maps: torch.Tensor, rois: torch.Tensor) -> torch.Tensor:
"""
Extract predicted keypoint locations from heatmaps.
Args:
maps (Tensor): (#ROIs, #keypoints, POOL_H, POOL_W). The predicted heatmap of logits for
each ROI and each keypoint.
rois (Tensor): (#ROIs, 4). The box of each ROI.
Returns:
Tensor of shape (#ROIs, #keypoints, 4) with the last dimension corresponding to
(x, y, logit, score) for each keypoint.
When converting discrete pixel indices in an NxN image to a continuous keypoint coordinate,
we maintain consistency with :meth:`Keypoints.to_heatmap` by using the conversion from
Heckbert 1990: c = d + 0.5, where d is a discrete coordinate and c is a continuous coordinate.
"""
offset_x = rois[:, 0]
offset_y = rois[:, 1]
widths = (rois[:, 2] - rois[:, 0]).clamp(min=1)
heights = (rois[:, 3] - rois[:, 1]).clamp(min=1)
widths_ceil = widths.ceil()
heights_ceil = heights.ceil()
num_rois, num_keypoints = maps.shape[:2]
xy_preds = maps.new_zeros(rois.shape[0], num_keypoints, 4)
width_corrections = widths / widths_ceil
height_corrections = heights / heights_ceil
keypoints_idx = torch.arange(num_keypoints, device=maps.device)
for i in range(num_rois):
outsize = (int(heights_ceil[i]), int(widths_ceil[i]))
roi_map = F.interpolate(maps[[i]], size=outsize, mode="bicubic", align_corners=False)[0]
max_score, _ = roi_map.view(num_keypoints, -1).max(1)
max_score = max_score.view(num_keypoints, 1, 1)
roi_map = roi_map - max_score
exp_map = roi_map.exp()
total_exp = exp_map.view(num_keypoints, -1).sum(dim=1).view(num_keypoints, 1, 1)
roi_map_scores = exp_map / total_exp
w = roi_map.shape[2]
pos = roi_map.view(num_keypoints, -1).argmax(1)
x_int = pos % w
y_int = (pos - x_int) // w
assert (
roi_map_scores[keypoints_idx, y_int, x_int]
== roi_map_scores.view(num_keypoints, -1).max(1)[0]
).all()
x = (x_int.float() + 0.5) * width_corrections[i]
y = (y_int.float() + 0.5) * height_corrections[i]
xy_preds[i, :, 0] = x + offset_x[i]
xy_preds[i, :, 1] = y + offset_y[i]
xy_preds[i, :, 2] = roi_map.view(num_keypoints, -1).gather(1, pos.unsqueeze(1)).squeeze(1)
xy_preds[i, :, 3] = roi_map_scores.view(num_keypoints, -1).gather(1, pos.unsqueeze(1)).squeeze(1)
return xy_preds
@RajUpadhyay FYI I am able now to run keypoints prediction in TensorRT (heatmaps -> keypoints + repositioning happens in numpy in postprocessing), describing my process with a little bit more details in TensorRT issue : https://github.com/NVIDIA/TensorRT/issues/3792
@Huxwell Wow, congrats! So glad you could do it! Thanks for letting me know. I wonder if this is too much to ask but could I request the create_onnx.py you modified? Thanks!
Sorry, I asked and apparently my company policy doesn't allow me to, but I think the snippets from the questions I asked in these issues (mostly the roi_head() function) should be enough for you to reproduce the effect relatively easily.
EDIT: I am discussing export_model.py issues with keypoints in : https://github.com/facebookresearch/detectron2/issues/5143 since it receives more attention.
Instructions To Reproduce the 🐛 Bug:
Traceback (most recent call last): File "detectron2/tools/deploy/export_model.py", line 225, in
exported_model = export_tracing(torch_model, sample_inputs)
File "detectron2/tools/deploy/export_model.py", line 132, in export_tracing
torch.onnx.export(traceable_model, (image,), f, opset_version=STABLE_ONNX_OPSET_VERSION)
File "/home/ubuntu/detectron2/env_perception/lib/python3.8/site-packages/torch/onnx/utils.py", line 506, in export
_export(
File "/home/ubuntu/detectron2/env_perception/lib/python3.8/site-packages/torch/onnx/utils.py", line 1548, in _export
graph, params_dict, torch_out = _model_to_graph(
File "/home/ubuntu/detectron2/env_perception/lib/python3.8/site-packages/torch/onnx/utils.py", line 1117, in _model_to_graph
graph = _optimize_graph(
File "/home/ubuntu/detectron2/env_perception/lib/python3.8/site-packages/torch/onnx/utils.py", line 665, in _optimize_graph
graph = _C._jit_pass_onnx(graph, operator_export_type)
File "/home/ubuntu/detectron2/env_perception/lib/python3.8/site-packages/torch/onnx/utils.py", line 1891, in _run_symbolic_function
return symbolic_fn(graph_context, *inputs, *attrs)
File "/home/ubuntu/detectron2/env_perception/lib/python3.8/site-packages/torch/onnx/symbolic_opset9.py", line 6709, in prim_loop
torch._C._jit_pass_onnx_block(
File "/home/ubuntu/detectron2/env_perception/lib/python3.8/site-packages/torch/onnx/utils.py", line 1891, in _run_symbolic_function
return symbolic_fn(graph_context, inputs, **attrs)
File "/home/ubuntu/detectron2/env_perception/lib/python3.8/site-packages/torch/onnx/symbolic_opset11.py", line 1063, in index
return opset9.index(g, self, index)
File "/home/ubuntu/detectron2/env_perception/lib/python3.8/site-packages/torch/onnx/symbolic_opset9.py", line 5580, in index
return symbolic_helper._unimplemented(
File "/home/ubuntu/detectron2/env_perception/lib/python3.8/site-packages/torch/onnx/symbolic_helper.py", line 607, in _unimplemented
_onnx_unsupported(f"{op}, {msg}", value)
File "/home/ubuntu/detectron2/env_perception/lib/python3.8/site-packages/torch/onnx/symbolic_helper.py", line 618, in _onnx_unsupported
raise errors.SymbolicValueError(
torch.onnx.errors.SymbolicValueError: Unsupported: ONNX export of operator aten::index, operator of advanced indexing on tensor of unknown rank. Try turning on shape inference during export: torch.onnx._export(..., onnx_shape_inference=True).. Please feel free to request support or submit a pull request on PyTorch GitHub: https://github.com/pytorch/pytorch/issues [Caused by the value 'roi_map.3 defined in (%roi_map.3 : Tensor = onnx::Reshape(%roi_map, %2727) # /home/ubuntu/detectron2/detectron2/detectron2/structures/keypoints.py:205:18
)' (type 'Tensor') in the TorchScript graph. The containing node has kind 'onnx::Reshape'.]
(node defined in File "/home/ubuntu/detectron2/detectron2/detectron2/structures/keypoints.py", line 205
Although semantically equivalent,
reshape
is used instead ofsqueeze
due)
wget -nc -q https://github.com/facebookresearch/detectron2/raw/main/detectron2/utils/collect_env.py && python collect_env.py
/home/ubuntu/detectron2/env_perception/lib/python3.8/site-packages/torch/cuda/init.py:546: UserWarning: Can't initialize NVML warnings.warn("Can't initialize NVML")
sys.platform linux Python 3.8.10 (default, Nov 22 2023, 10:22:35) [GCC 9.4.0] numpy 1.24.4 detectron2 0.6 @/home/ubuntu/detectron2/detectron2/detectron2 detectron2._C not built correctly: No module named 'detectron2._C' Compiler ($CXX) c++ (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0 DETECTRON2_ENV_MODULE
PyTorch 2.0.0+cu117 @/home/ubuntu/detectron2/env_perception/lib/python3.8/site-packages/torch
PyTorch debug build False
torch._C._GLIBCXX_USE_CXX11_ABI False
GPU available Yes
GPU 0 NVIDIA T600 Laptop GPU (arch=7.5)
Driver version
CUDA_HOME None - invalid!
Pillow 10.0.0
torchvision 0.15.1+cu117 @/home/ubuntu/detectron2/env_perception/lib/python3.8/site-packages/torchvision
torchvision arch flags /home/ubuntu/detectron2/env_perception/lib/python3.8/site-packages/torchvision/_C.so
fvcore 0.1.5.post20221221
iopath 0.1.10
cv2 4.7.0
PyTorch built with: