apple / coremltools

Core ML tools contain supporting tools for Core ML model conversion, editing, and validation.
https://coremltools.readme.io
BSD 3-Clause "New" or "Revised" License
4.38k stars 631 forks source link

PyTorch flexible input working but not flexible output shapes with 5.0b1 #1244

Open 3DTOPO opened 3 years ago

3DTOPO commented 3 years ago

Even though the produced MLModel shows support for flexible input shapes and output shapes, the output is fixed with PyTorch model converted using the Unified Convertor.

It appears that flexible input is working properly but not the flexible output shape.

If I set the image dimension to 256 and give it an input image of 512 it is cropped to 256.

If I set the image dimension to 2048 and give it an input image of 256, the output is 2048 with mostly blank pixels (just the first 256 pixels are imaged).

An Apple engineer I spoke with at WWDC informed me that flexible images should be properly working with Monterey.

Reproducible with steps: https://github.com/apple/coremltools/issues/992

System environment

This is a critical issue for my work.

Fixed size is 2048: Screen Shot 2021-07-07 at 3 03 03 PM

Input 256 pixel image: Screen Shot 2021-07-07 at 3 03 32 PM

Input 1024 pixel image: Screen Shot 2021-07-07 at 3 03 50 PM

3DTOPO commented 3 years ago

My workaround of using a flexible-shape multi-array input to get flexible shape output no longer works with 5.0b1.

The flexible input array works, but, the output is the fixed image size despite it showing support for flexible image size output.

3DTOPO commented 3 years ago

Ugh - created a new coremltools 4.1 with PyTorch 1.8.1 env and no longer work with flexible output shapes. Now I don't even know how to get the workaround working again. Seems like it must be the OS. What a nightmare.

3DTOPO commented 3 years ago

It seems as of 4.1 I flexible image output shapes no longer work and I can't get 4.0 installed because of a numpy float32 error.

I was using 4.0 with my solution. I can't be certain that will solve it since I can't get it running.

aseemw commented 3 years ago

I think this is probably not related to flexible output shape. It could also be an issue with the preview feature. Have you verified the Core ML model output by comparing it with the Torch output , when both models are invoked with the same input, and checking whether they match?

It would help to debug if you can evaluate the following scenarios and see which one these work, in terms of matching with torch: (let the input be of type image)

I looked at the source code in #992 and verified that the torch output and the coreml output match when the output is of type multiarray (flexible or not) (input is image type, macOS12, coremltools 5.0b2). When output is of type image, I could not compare properly, since for a torch model with random weights, the output was all 0s in the uint8 domain.

3DTOPO commented 3 years ago

Yes, my model works properly with PyTorch.

It works if the input/output is not flexible with both images and arrays. I don't understand how that is not related to flexible output shapes.

Note that the models are trained on Ubuntu, and I am converting to MLModel on macOS 12. I get totally different results when I convert on Ubuntu (but cannot get flexible working there either - just more bugs - too many for me to document). Both are using conda with identical environments and versions.

Pretty much everything I try, and I have tried like hundreds of different combinations, the model compiles but does not work on iOS. It's enough to make a grown man feel like crying.

aseemw commented 3 years ago

Can you please share the code that you are using to compare torch and coreml predictions, which passes when input/output are not flexible, but fails, when they are.

Conversion code is purely python, so that will be same on ubuntu / macos, when using the same wheel version.

3DTOPO commented 3 years ago

I'll work on sharing reproducibles with you, but, I did not save scripts of everything I tried (and I tried countless configurations). So I have first go back and produce the issue myself.

If it should be identical, that is possibly what is at heart of the issue here, because I get completely different results if I convert on Ubuntu on the machine where the models are trained, versus converting on my macBook running Monterey. I wonder if there might be some PyTorch settings that vary by platform? For one I use GPUs on Ubuntu and don't on the Mac.

Just so I am clear, for flexible results to work, the model has to be compiled with Xcode 13 on macOS 12 is that correct?

aseemw commented 3 years ago

Just so I am clear, for flexible results to work, the model has to be compiled with Xcode 13 on macOS 12 is that correct?

Yes, if you are using the combination of "image input/output + rangeDim". If you are using image inputs/output with enumerated shape or multiarray input/output with range/enumerated, that will all work with Xcode 12/macOS11 combo as well.

If it should be identical, that is possibly what is at heart of the issue here, because I get completely different results if I convert on Ubuntu on the machine where the models are trained, versus converting on my macBook running Monterey. I wonder if there might be some PyTorch settings that vary by platform? For one I use GPUs on Ubuntu and don't on the Mac.

If the pytorch models are identical (with .eval() mode)on ubuntu and mac, then the coreml models should also be.

3DTOPO commented 3 years ago

Screen Shot 2021-07-29 at 3 04 11 PM

Here is a full reproducible, model class, conversion script, PyTorch and MLModel: demoModelColor.zip

Note that this is a slightly different architecture, but I was getting the same results with the the model from https://github.com/apple/coremltools/issues/992

Coremltools 5.0b2, Mac OS 12.0 Beta 21A5268h, 13.0 beta (13A5155e) - currently updating OS and Xcode and will update if it behaves differently.

dragen1860 commented 2 years ago

@3DTOPO any updates on this? thank you.

3DTOPO commented 2 years ago

Sadly no.

dragen1860 commented 2 years ago

@3DTOPO from my own experience, I think set input as Image flexible and out tensortype will probably work.