onnx inference problem - Githubissues

blueFeather111 commented 2 years ago

I convert the model to onnx using the model_to_onnx.py, with the args:--dataset nyuv2 --last_ckpt ./trained_models/nyuv2/r34_NBt1D.pth the model can be converted to model.onnx,

the input shape is : NodeArg(name='rgb', type='tensor(float)', shape=[1, 3, 480, 640]) NodeArg(name='depth', type='tensor(float)', shape=[1, 1, 480, 640])

but when I use the onnx to do inference, using device = "cpu", not using cuda error occurred: onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running Conv node. Name:'' Status Message: Input channels C is not equal to kernel channels * group. C: 514 kernel channels: 1024 group: 1

I cannot understand why the channels does not match, can you give me a hint? here is my inference code:

def to_numpy(tensor):
    return tensor.detach().cpu().numpy() if tensor.requires_grad else tensor.cpu().numpy()

if __name__ == '__main__':
    img_path = "~/dataset/nyuv2/test/rgb/0028.png"
    depth_path = "~/dataset/nyuv2/test/depth_raw/0028.png"
    img = cv2.imread(img_path, cv2.IMREAD_UNCHANGED)
    depth = cv2.imread(depth_path, cv2.IMREAD_UNCHANGED)
    if img.ndim == 3:
        img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

    depth = depth.astype('float32') * 0.1

    #preprocess
    #not write for  short

    session = onnxruntime.InferenceSession("./onnx_models/model.onnx")
    inputs = {session.get_inputs()[0].name: to_numpy(img), session.get_inputs()[1].name: to_numpy(depth)}
    outs = session.run(None, inputs)[0]

inputs shape: 'rgb': ndarray:(1, 3, 480, 640) 'depth': ndarray: (1, 1, 480, 640)

danielS91 commented 2 years ago

It is difficult to give you a specific hint. With Reset34 backbone 1024 channels should only occur after the context module - 514 channels is quite strange, it should not occur. Did you modify the code for the context module?

If not, note that there is a lot of progress in all related libaries and frameworks. We would need more information on your environment. While developing the approach, we faced several problems related to grouped convolutions, interpolation modes, and shape inference. We had to carefully set the opset version for different execution providers. Luckily, there was an update to TensorRT right before we finished the development, which enabled direct inference using TensorRT with correct results. Since then, we did not spend any time on ONNXRuntime. I would recommend to first check the onnx model with netron. Subsequently, you may have a look at the "--upsampling" argument to test different upsampling modes. Last but not least, a good approach for debugging is also to modify the main forward function to prevent individual network parts from being included in the onnx model.

gurbain commented 1 year ago

Hello @blueFeather111,

I trained the ESANet based on custom dataset and classes, converted it to ONNX and I get the same error during execution:

Non-zero status code returned while running FusedConv node. Name:'' Status Message: Input channels C is not equal to kernel channels * group. C: 514 kernel channels: 1024 group: 1

My input shapes are also the same. I checked out the network with Netron but I don't really understand what it relates to. Did you make any progress on this? Thank you!

TUI-NICR / ESANet

onnx inference problem #41