Open himanshunaidu opened 1 month ago
Experimented in the following PR: https://github.com/himanshunaidu/EdgeNets/pull/1
Seems to be working in better specs: i9 + 4090
However, takes quite long even with better specs.
Also, should remove the minimum_deployment_target=ct.target.iOS13
.
Next steps: Confirm the new working mlmodel and test it on the app.
@NaturalStupidlty no need to do anything from your side at the moment. This will be taken care of later.
It seems like using the ESPNetv2 model from the EdgeNets model_zoo will not work, because the output of the ESPNetv2 cityscapes model's output is (1, 20, 256, 512). The output needs to be post-processed, with the following code:
img_out = model(img)
img_out = img_out.squeeze(0) # remove the batch dimension
img_out = img_out.max(0)[1].byte() # get the label map
img_out = img_out.to(device='cpu').numpy()
If we want to create a Core-ML model out of this, the logical way would be to create a new model, that uses the ESPNetv2 model, and post-processes the image accordingly. This is in theory, possible, because the code is Pytorch code.
However, it may be prudent to contact the authors of the repository to find the final solution, as we keep running into roadblocks.
Ideally, the following is along the lines of the conversion that we should be doing:
mlmodel = ct.convert(
traced_model,
inputs=[ct.ImageType(name="input", shape=img.shape)],
outputs=[ct.ImageType(name="output", color_layout=ct.colorlayout.GRAYSCALE)],
convert_to='neuralnetwork',
compute_units=ct.ComputeUnit.ALL
# minimum_deployment_target=ct.target.iOS13
)
(We may want to watch out for ct.ComputeUnit.ALL causing issues in the output based on the conversation here: https://forums.developer.apple.com/forums/thread/95775)
It seems like using the ESPNetv2 model from the EdgeNets model_zoo will not work, because the output of the ESPNetv2 cityscapes model's output is (1, 20, 256, 512). The output needs to be post-processed, with the following code:
img_out = model(img) img_out = img_out.squeeze(0) # remove the batch dimension img_out = img_out.max(0)[1].byte() # get the label map img_out = img_out.to(device='cpu').numpy()
If we want to create a Core-ML model out of this, the logical way would be to create a new model, that uses the ESPNetv2 model, and post-processes the image accordingly. This is in theory, possible, because the code is Pytorch code.
However, it may be prudent to contact the authors of the repository to find the final solution, as we keep running into roadblocks.
You can create a wrapper for the PyTorch model that does argmax in the forward() method after extracting raw features (as I did to solve this problem) or do this in the Swift code during postprocessing.
Would it be possible for you to create a wrapper model? I unfortunately couldn't get access to a better instance. The conversion takes around a day for me, which means that debugging would be impractical.
Would it be possible for you to create a wrapper model? I unfortunately couldn't get access to a better instance. The conversion takes around a day for me, which means that debugging would be impractical.
Yes, I can do this. However, it is still a mystery to me why the conversion takes so long.
Well, the inference of this model takes very long. Considering the fact that there would be better options out there, some of which we are being able to utilize, I am not sure if we should continue trying to fix this issue. Hence I am going to track this in lower priority, and will probably close this issue as 'Not planned' if we don't get a resolution for it.
Need to convert the Pytorch model into CoreML version. Not working with ESPNetv2 for some reason.