OpenGVLab / InternImage

[CVPR 2023 Highlight] InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions
https://arxiv.org/abs/2211.05778
MIT License
2.47k stars 231 forks source link

Export Model to Onnx as Fp16 Not Working #305

Open RhinoInani opened 1 month ago

RhinoInani commented 1 month ago

Hello, I have been working to try to export an onnx classification model as fp16 but have been running into issues as certain layers are not converting to fp16.

Here are the steps I have taken so far:

1. I have followed issue: #245

Changed the files (dcvn3.py, and dcnv3_func.py) to force dtype=torch.float16 rather than torch.float.

2. I have also forced the model to be .half() in the export.py file in the classification folder.

Here is an updated version of what I changed the torch2onnx function:

def torch2onnx(args, cfg):
    model = get_model(args, cfg).eval().cuda()

    # speed_test(model)

    onnx_name = f'{args.model_name}_half.onnx'
    torch.onnx.export(model.half(),
                      torch.rand(1, 3, args.size, args.size).cuda().half(),
                      onnx_name,
                      opset_version=16,
                      do_constant_folding=False,
                      input_names=['input'],
                      output_names=['output'])

    return model

While the export works fine the later steps for testing the onnx model do not work due to layer issues, as described in later sections.

3. Changed the core_op in the corresponding yaml file to core_op: 'DCNv3_pytorch' as shown below:

File: classification/configs/internimage_b_1k_224.yaml

DATA:
  IMG_ON_MEMORY: True
MODEL:
  TYPE: intern_image
  DROP_PATH_RATE: 0.5
  INTERN_IMAGE:
    CORE_OP: 'DCNv3_pytorch'
    DEPTHS: [4, 4, 21, 4]
    GROUPS: [7, 14, 28, 56]
    CHANNELS: 112
    LAYER_SCALE: 1e-5
    OFFSET_SCALE: 1.0
    MLP_RATIO: 4.0
    POST_NORM: True
TRAIN:
  EMA:
    ENABLE: True
    DECAY: 0.9999
  BASE_LR: 5e-4

4. Convert Onnx file to Inference Session via onnxruntime:

inference_sess = ort.InferenceSession(onnx_file, providers=['CUDAExecutionProvider'], core_op='DCNv3_pytorch', sess_options=ort.SessionOptions())

This is the error I am running into when the line above is run:

Traceback (most recent call last):                                                                                                                                                                                   
  File "onnx_intern_image_test.py", line 119, in <module>                                                                                                                                                            
    inference_sess = ort.InferenceSession(onnx_file, providers=['CUDAExecutionProvider'], core_op='DCNv3_pytorch', sess_options=ort.SessionOptions())                                                                
  File "~*******/python3.7/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 360, in __init__                                                          
    self._create_inference_session(providers, provider_options, disabled_optimizers)                                                                                                                                 
  File "~*****/miniconda3/envs/internimage/lib/python3.7/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 397, in _create_inference_session                                         
    sess = C.InferenceSession(session_options, self._model_path, True, self._read_config_from_model)                                                                                                                 
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Load model from ~*******InternImage/classification/intern_image_b_1k_224_half.onnx failed:Type Error: Type parameter (T) of Optype
 (Div) bound to different types (tensor(float) and tensor(float16) in node (Div_209).  

As you can see there are different types for the layer "Div_209".

Please let me know ASAP if there are any fixes or any known ways to convert the classification model to fp16 for onnx.

Thanks in advanced!