MODNet input and output sizes

anilsathyan7 / Portrait-Segmentation

Real-time portrait segmentation for mobile devices

MIT License

638 stars 133 forks source link

MODNet input and output sizes #42

Closed ldenoue closed 3 years ago

ldenoue commented 3 years ago

I've downloaded the mlmodel in https://github.com/anilsathyan7/Portrait-Segmentation/tree/master/MODNet and noticed that the input size expects 512x512 but the ouput is 1252 MultiArray (Float32). How would I use the output to create a 2D mask of the original image?

anilsathyan7 commented 3 years ago

The output dimension is 1x1x512x512. So you can either use indexing and convert the array to image (code)or (maybe) make output as image within the model, using coremltools. modnet_mlmodel

ldenoue commented 3 years ago

I was able to use coremltools to change the ouput into an image [1,512,512] but unfortunately the pixel data is all black: I think it's because the floats were all in the [-1,1] range, so an extra post-processing would be required. Ideally this would happen in the CoreML model itself instead of Swift code. I'm stuck now.

On a related topic, which model would you advise I use for fast inference on mobile? One with a good trade-off between speed and quality.

anilsathyan7 commented 3 years ago

The outputs will be in the range 0-1, since we are using sigmoid activations.

For inputs you can refer the onnx inference code for normalization:-

# Preprocess images based on the original training/inference code
mean = [0.5, 0.5,  0.5 ]
std = [0.5,  0.5, 0.5]

img = (inp-mean)/std

May be you can make inputs multiarray and do preprocessing before feeding inputs to the model. Unfortunately, i couldn't test the model on a mac system.

You may try out mobilenetv2 based models for a good trade-off between speed and quality. Additionaly you can refer the following links:-