Get the ESPNetv2 Cityscapes model running

himanshunaidu commented 1 month ago

Need to convert the Pytorch model into CoreML version. Not working with ESPNetv2 for some reason.

himanshunaidu commented 1 month ago

Experimented in the following PR: https://github.com/himanshunaidu/EdgeNets/pull/1

Seems to be working in better specs: i9 + 4090 However, takes quite long even with better specs. Also, should remove the minimum_deployment_target=ct.target.iOS13.

Next steps: Confirm the new working mlmodel and test it on the app.

himanshunaidu commented 1 month ago

@NaturalStupidlty no need to do anything from your side at the moment. This will be taken care of later.

himanshunaidu commented 1 month ago

It seems like using the ESPNetv2 model from the EdgeNets model_zoo will not work, because the output of the ESPNetv2 cityscapes model's output is (1, 20, 256, 512). The output needs to be post-processed, with the following code:

img_out = model(img)
img_out = img_out.squeeze(0)  # remove the batch dimension
img_out = img_out.max(0)[1].byte()  # get the label map
img_out = img_out.to(device='cpu').numpy()

If we want to create a Core-ML model out of this, the logical way would be to create a new model, that uses the ESPNetv2 model, and post-processes the image accordingly. This is in theory, possible, because the code is Pytorch code.

However, it may be prudent to contact the authors of the repository to find the final solution, as we keep running into roadblocks.

himanshunaidu commented 1 month ago

Ideally, the following is along the lines of the conversion that we should be doing:

mlmodel = ct.convert(
        traced_model,
        inputs=[ct.ImageType(name="input", shape=img.shape)],
        outputs=[ct.ImageType(name="output", color_layout=ct.colorlayout.GRAYSCALE)],
        convert_to='neuralnetwork',
        compute_units=ct.ComputeUnit.ALL
        # minimum_deployment_target=ct.target.iOS13
)

(We may want to watch out for ct.ComputeUnit.ALL causing issues in the output based on the conversation here: https://forums.developer.apple.com/forums/thread/95775)

NaturalStupidlty commented 1 month ago

It seems like using the ESPNetv2 model from the EdgeNets model_zoo will not work, because the output of the ESPNetv2 cityscapes model's output is (1, 20, 256, 512). The output needs to be post-processed, with the following code:
img_out = model(img)
img_out = img_out.squeeze(0)  # remove the batch dimension
img_out = img_out.max(0)[1].byte()  # get the label map
img_out = img_out.to(device='cpu').numpy()
If we want to create a Core-ML model out of this, the logical way would be to create a new model, that uses the ESPNetv2 model, and post-processes the image accordingly. This is in theory, possible, because the code is Pytorch code.

However, it may be prudent to contact the authors of the repository to find the final solution, as we keep running into roadblocks.

You can create a wrapper for the PyTorch model that does argmax in the forward() method after extracting raw features (as I did to solve this problem) or do this in the Swift code during postprocessing.

himanshunaidu commented 1 month ago

Would it be possible for you to create a wrapper model? I unfortunately couldn't get access to a better instance. The conversion takes around a day for me, which means that debugging would be impractical.

NaturalStupidlty commented 1 month ago

Would it be possible for you to create a wrapper model? I unfortunately couldn't get access to a better instance. The conversion takes around a day for me, which means that debugging would be impractical.

Yes, I can do this. However, it is still a mystery to me why the conversion takes so long.

himanshunaidu commented 3 weeks ago

Well, the inference of this model takes very long. Considering the fact that there would be better options out there, some of which we are being able to utilize, I am not sure if we should continue trying to fix this issue. Hence I am going to track this in lower priority, and will probably close this issue as 'Not planned' if we don't get a resolution for it.

TaskarCenterAtUW / iOSPointMapper

Get the ESPNetv2 Cityscapes model running #42