TaskarCenterAtUW / iOSPointMapper

4 stars 0 forks source link

Get the ESPNetv2 Cityscapes model running #42

Open himanshunaidu opened 4 weeks ago

himanshunaidu commented 4 weeks ago

Need to convert the Pytorch model into CoreML version. Not working with ESPNetv2 for some reason.

himanshunaidu commented 1 week ago

Experimented in the following PR: https://github.com/himanshunaidu/EdgeNets/pull/1

Seems to be working in better specs: i9 + 4090 However, takes quite long even with better specs. Also, should remove the minimum_deployment_target=ct.target.iOS13.

Next steps: Confirm the new working mlmodel and test it on the app.

himanshunaidu commented 1 week ago

@NaturalStupidlty no need to do anything from your side at the moment. This will be taken care of later.

himanshunaidu commented 1 week ago

It seems like using the ESPNetv2 model from the EdgeNets model_zoo will not work, because the output of the ESPNetv2 cityscapes model's output is (1, 20, 256, 512). The output needs to be post-processed, with the following code:

img_out = model(img)
img_out = img_out.squeeze(0)  # remove the batch dimension
img_out = img_out.max(0)[1].byte()  # get the label map
img_out = img_out.to(device='cpu').numpy()

If we want to create a Core-ML model out of this, the logical way would be to create a new model, that uses the ESPNetv2 model, and post-processes the image accordingly. This is in theory, possible, because the code is Pytorch code.

However, it may be prudent to contact the authors of the repository to find the final solution, as we keep running into roadblocks.

himanshunaidu commented 1 week ago

Ideally, the following is along the lines of the conversion that we should be doing:

mlmodel = ct.convert(
        traced_model,
        inputs=[ct.ImageType(name="input", shape=img.shape)],
        outputs=[ct.ImageType(name="output", color_layout=ct.colorlayout.GRAYSCALE)],
        convert_to='neuralnetwork',
        compute_units=ct.ComputeUnit.ALL
        # minimum_deployment_target=ct.target.iOS13
)

(We may want to watch out for ct.ComputeUnit.ALL causing issues in the output based on the conversation here: https://forums.developer.apple.com/forums/thread/95775)

NaturalStupidlty commented 1 week ago

It seems like using the ESPNetv2 model from the EdgeNets model_zoo will not work, because the output of the ESPNetv2 cityscapes model's output is (1, 20, 256, 512). The output needs to be post-processed, with the following code:

img_out = model(img)
img_out = img_out.squeeze(0)  # remove the batch dimension
img_out = img_out.max(0)[1].byte()  # get the label map
img_out = img_out.to(device='cpu').numpy()

If we want to create a Core-ML model out of this, the logical way would be to create a new model, that uses the ESPNetv2 model, and post-processes the image accordingly. This is in theory, possible, because the code is Pytorch code.

However, it may be prudent to contact the authors of the repository to find the final solution, as we keep running into roadblocks.

You can create a wrapper for the PyTorch model that does argmax in the forward() method after extracting raw features (as I did to solve this problem) or do this in the Swift code during postprocessing.

himanshunaidu commented 1 week ago

Would it be possible for you to create a wrapper model? I unfortunately couldn't get access to a better instance. The conversion takes around a day for me, which means that debugging would be impractical.

NaturalStupidlty commented 1 week ago

Would it be possible for you to create a wrapper model? I unfortunately couldn't get access to a better instance. The conversion takes around a day for me, which means that debugging would be impractical.

Yes, I can do this. However, it is still a mystery to me why the conversion takes so long.