Closed ldenoue closed 3 years ago
The output dimension is 1x1x512x512. So you can either use indexing and convert the array to image (code)or (maybe) make output as image within the model, using coremltools.
I was able to use coremltools to change the ouput into an image [1,512,512] but unfortunately the pixel data is all black: I think it's because the floats were all in the [-1,1] range, so an extra post-processing would be required. Ideally this would happen in the CoreML model itself instead of Swift code. I'm stuck now.
On a related topic, which model would you advise I use for fast inference on mobile? One with a good trade-off between speed and quality.
The outputs will be in the range 0-1, since we are using sigmoid activations.
For inputs you can refer the onnx inference code for normalization:-
# Preprocess images based on the original training/inference code
mean = [0.5, 0.5, 0.5 ]
std = [0.5, 0.5, 0.5]
img = (inp-mean)/std
May be you can make inputs multiarray and do preprocessing before feeding inputs to the model. Unfortunately, i couldn't test the model on a mac system.
You may try out mobilenetv2 based models for a good trade-off between speed and quality. Additionaly you can refer the following links:-
I've downloaded the mlmodel in https://github.com/anilsathyan7/Portrait-Segmentation/tree/master/MODNet and noticed that the input size expects 512x512 but the ouput is 1252 MultiArray (Float32). How would I use the output to create a 2D mask of the original image?