WildChlamydia / MiVOLO

MiVOLO age & gender transformer neural network
308 stars 54 forks source link

Cannot export MiVOLO model into `onnx` format using `torch.onnx.export` #14

Closed MasterHM-ml closed 1 year ago

MasterHM-ml commented 1 year ago

I am adding a line here to convert self.model into onnx format. Here is my code snippet

        random_input = torch.randn(1, 6, 224, 224, device=self.device)
        onnx_model_name = "mi_volo.onnx"
        # pytorch to onnx
        torch.onnx.export(self.model, random_input, onnx_model_name, verbose=True, opset_version=18)

but I am getting the following error:

============= Diagnostic Run torch.onnx.export version 2.0.1+cu117 =============
verbose: False, log level: Level.ERROR
======================= 0 NONE 0 NOTE 0 WARNING 0 ERROR ========================

Traceback (most recent call last):
  File "/home/master/.local/lib/python3.10/site-packages/torch/onnx/symbolic_opset18.py", line 52, in col2im
    num_dimensional_axis = symbolic_helper._get_tensor_sizes(output_size)[0]
TypeError: 'NoneType' object is not subscriptable

I tried debugging the error, but couldn't understand it due to less familiarity with the conversion process. The actual line that is causing an error is num_dimensional_axis = symbolic_helper._get_tensor_sizes(output_size)[0]; I tried inspecting the output_size variable and it is 1072 defined in (%1072 : int[] = prim::ListConstruct(%958, %963), scope: mivolo.model.mivolo_model.MiVOLOModel::/torch.nn.modules.container.Sequential::network.0/timm.models.volo.Outlooker::network.0.0/timm.models.volo.OutlookAttention::attn)

MiVOLO - Latest Pull Pytorch version - 2.0.1 onnx version - 1.14.1 OS - Ubuntu 22.04.3LTS

Any help/direction/discussion will be highly appreciated, thank you.

WildChlamydia commented 1 year ago

Hello! It won't be easy.

  1. You need the Timm library version directly from the source by cloning it from GitHub (you can remove it as a module and clone it, then add to PYTHONPATH). This is necessary for the upcoming steps.

  2. Explicitly convert the variables 'H' and 'W' to int() here, it's timm bug

  3. During the ONNX conversion process, you will encounter the following error:

torch.onnx.errors.CheckerError: Unrecognized attribute: axes for operator ReduceMax

==> Context: Bad node spec for node. Name: /ReduceMax OpType: ReduceMax

To address this issue, you need to make a modification to the Torch sources. Open the file torch/onnx/utils.py and locate the _export function. Comment out the line that checks the ONNX proto using the following code:

if (operator_export_type is _C_onnx.OperatorExportTypes.ONNX) and (
      not val_use_external_data_format
):
      try:
-         _C._check_onnx_proto(proto)
+         pass #_C._check_onnx_proto(proto)
      except RuntimeError as e:
         raise errors.CheckerError(e) from e
  1. After that model will be saved but won't work because of torch.onnx bug. You have to rewrite the graph:
    
    onnx_model = onnx.load(output_file)
    # Get the graph from the model
    graph = onnx_model.graph

Iterate through all nodes in the graph

for node in graph.node: if "ReduceMax" in node.op_type: for index in range(len(node.attribute)): if node.attribute[index].name == "axes": del node.attribute[index] axes_input = onnx.helper.make_tensor_value_info("axes", onnx.TensorProto.INT64, [1]) axes_value = numpy_helper.from_array(np.array([1]), "axes") onnx_model.graph.input.extend([axes_input]) onnx_model.graph.initializer.extend([axes_value]) node.input.append("axes") break


Now, save this model. It will work.

And all of this is simply not worth it: the **ONNX model performs poorly with batch processing**, and TensorRT is currently not an option due to its lack of support for col2im.
The best way for now is TorchScript.

Good luck and thank you for your star.
MasterHM-ml commented 1 year ago

And all of this is simply not worth it: the ONNX model performs poorly with batch processing, and TensorRT is currently not an option due to its lack of support for col2im. The best way for now is TorchScript.

Thank you for your detailed reply. Through documentation and GitHub issues, I completed the first 3 steps and was ready to convert the model into OpenVINO IR, and found out that the OpenVINO runtime does not support the col2im operation. I was not aware of the issue that you guided in step 4. Massive thanks for informing providing the guide. I will do that.

and, yeah, as you said

Hello! It won't be easy.

this was really not easy. I've spent 3 days and still see no chance of reaching the finish line today.

Hab2Verer commented 5 months ago

@MasterHM-ml @WildChlamydia Hello Guys Good work.

Did you find the definitive way to convert the model to onnx?