TUI-NICR / ESANet

ESANet: Efficient RGB-D Semantic Segmentation for Indoor Scene Analysis
Other
231 stars 49 forks source link

Make ESANet work with arbitrary input size/shape #42

Closed mszuyx closed 2 years ago

mszuyx commented 2 years ago

Hi ESANet team!

Fantastic job!

We are trying to change the model to work with input dimension of 256x256.

We have successfully done that and trained the model with training samples with size 256x256.

But when we try to convert the model to ONNX using the provided "model_to_onnx.py" script, it keep raising the "ONNX export failed on adaptive_avg_pool2d because output size that are not factor of input size not supported" error.

We have no issue converting the model with the default dimensions (1,3,480,640) and (1,1,480,640). 480X480 and 640x640 appears to work too.

Since the script has no problem converting the model with the default dimensions, we don't think it is a onnx support issue. And since the model can be trained with samples of size 256x256, we don't think it's the model's issue either. The fact that 480X480 and 640x640 work means the model and onnx export can deal with square shape inputs, so it's not a input size ratio problem.

We wonder why the conversion can only deal with these two magic numbers and how to make it work with other input sizes. (We are working on a FPS benchmark, so it is important that all the tested models share the same input size)

Thanks! : )

danielS91 commented 2 years ago

As far as I know, ONNX still does not support adaptive pooling (pooling with fixed output sizes instead of fixed window sizes). This must be considered during network design and is one of the point we mean by carefully designed.

By default, our model uses an input size (HxW) of 480x640, a downsampling of 32, and fixed sizes of 1x1 and 5x5 in the context module (see: here). i.e., 480x640 -> 15x20 -> [1x1, 5x5] -> 15x20 -> ... . As both 15 and 20 are a multiple of (1 and) 5, we do not need adaptive pooling in our network design, the pooling operations are converted to a global average pooling and an average pooling with fixed window size of 3x4 during export to ONNX.

With an input size of 256x256, the input size to the context module will be 8x8, which is not a muliple of 5 - so pytorch is forced to use pooling windows of different size to accomplish an output size of 5x5.

Changing the bins to (1, 4) instead of (1, 5) will probably fix your problem.

mszuyx commented 2 years ago

@danielS91 It works! Thanks for that! Very solid answer!