linear_quantize_weights make wrong result in GPU / ANE

TimYao18 commented 10 months ago

🐞Describing the bug

The MiDaS model is employed for monocular depth estimation. The model quantized with linear_quantize_weights will get wrong result when compute unit is CPU_AND_GPU or CPU_AND_NE. The OpLinearQuantizerConfig parameter will affect the result as follow:

op_config = cto.OpLinearQuantizerConfig(mode="linear_symmetric"), predict result value: (1) CPU_AND_GPU: max value = min value = 0.5849609375 (2) CPU_AND_NE: max value: inf, min value: 0.0
op_config = cto.OpLinearQuantizerConfig(mode="linear_symmetric", dtype=np.uint8) or op_config = cto.OpLinearQuantizerConfig(mode="linear"), predict result value: CPU_AND_GPU: max value = min value = 0.5849609375

To Reproduce

I wrote an ipynb file with a COCO image. Please download to test the codes.

Modify the computeUnit to CPU_ONLY will get the right result; change it to CPU_AND_GPU or CPU_AND_NE will reproduce the wrong result:

mlmodel = ct.models.MLModel(quantized_model_path, compute_units=ct.ComputeUnit.CPU_ONLY) # CPU_ONLY, CPU_AND_GPU, CPU_AND_NE, ALL

System environment (please complete the following information):

coremltools version: 7.1
OS (e.g. MacOS version or Linux type): macOS 14.2.1
Any other relevant version information (e.g. PyTorch or TensorFlow version): PyTorch 2.1.2

TobyRoseman commented 10 months ago

I can't open your ipynb file. Can you give us a minimal example to reproduce this issue?

TimYao18 commented 10 months ago

I copy it from ipynb as python code:

import requests
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image
import torch
import coremltools as ct
import coremltools.optimize.coreml as cto
import os

orchscript_path = 'model-small-traced.pt'
saved_model_path = 'model-small.mlpackage'
quantized_model_path = 'model-small-quant.mlpackage'
image_path = 'COCO_val2014_000000000761.jpg'

url = 'https://github.com/isl-org/MiDaS/releases/download/v2_1/model-small-traced.pt'

if not os.path.exists(torchscript_path):
    response = requests.get(url)

    with open(torchscript_path, 'wb') as file:
        file.write(response.content)

def normalize_and_show_image(output_dict):
    # Normalize the image to the range [0, 1]
    print(f"max value: {np.max(output_dict)}, min value: {np.min(output_dict)}")
    normalized_image = (output_dict - np.min(output_dict)) / (np.max(output_dict) - np.min(output_dict))
    normalized_image_pil = Image.fromarray((normalized_image[0] * 255).astype(np.uint8), mode='L')
    depth_map = normalized_image_pil.resize(image_size, Image.LANCZOS)
    depth_map.save('depth_map.jpg')

img = Image.open(image_path)
image_size = img.size
img = img.resize([256, 256], Image.LANCZOS)

# load TorchScript model
traced_model = torch.jit.load(torchscript_path)
traced_model.eval()

# create dummy inputs
input_size = (1, 3, 256, 256)
dummy_input = torch.randn(input_size)

## must add or details will disappear
scale = 1/(0.226*255.0)
bias = [- 0.485/(0.229) , - 0.456/(0.224), - 0.406/(0.225)]
image_imput = ct.ImageType(name="input_1", shape=dummy_input.shape, scale=scale, bias=bias)

# convert
mlmodel = ct.convert(traced_model, inputs=[image_imput])

# save Core ML model. The default model will be float16
mlmodel.save(saved_model_path)

op_config = cto.OpLinearQuantizerConfig(mode="linear_symmetric") # this setting makes error more easily
# op_config = cto.OpLinearQuantizerConfig(mode="linear", dtype=np.uint8)
config = cto.OptimizationConfig(global_config=op_config)

mlmodel = ct.models.MLModel(saved_model_path)
compressed_8_bit_model = cto.linear_quantize_weights(mlmodel, config=config)
compressed_8_bit_model.save(quantized_model_path)

# Make a prediction with the Core ML version of the model.
mlmodel = ct.models.MLModel(quantized_model_path, compute_units=ct.ComputeUnit.CPU_AND_GPU) # CPU_ONLY, CPU_AND_GPU, CPU_AND_NE, ALL
coreml_out_dict = mlmodel.predict({"input_1" : img})

# normalize value and show result image
normalize_and_show_image(coreml_out_dict['var_1186'])

The used image: COCO_val2014_000000000761

TimYao18 commented 10 months ago

Please change the computeUnit to get different result.

# CPU_ONLY, CPU_AND_GPU, CPU_AND_NE, ALL
mlmodel = ct.models.MLModel(quantized_model_path, compute_units=ct.ComputeUnit.CPU_AND_GPU)

TobyRoseman commented 10 months ago

Loading a PyTorch from an untrusted source is a security risk, since it allow arbitrary code execution.

Can you create a simpler and self contained example (e.x. using a toy model which is fully defined in the code)?

TimYao18 commented 10 months ago

I feel like this matter might be related to the model, so if we just create a toy model, the same situation may not occur. If you find the PyTorch model downloaded from the MiDaS official website insecure, would it be safer if I convert it into an Core ML mlpackage?

TimYao18 commented 10 months ago

BTW, I've tried both the Quantized model and the Palettized model. Only the Quantized model encounters this issue, and it behaves normally when using the CPU.

The images generated using the Quantized model are as follows: CPU_ONLY (normal),

CPU_AND_GPU (predict results are all the same), CPU_AND_NE (the maximum value is infinite). so this two won't generate image or generate black image. Using custom training from MiDaS official model with CPU_AND_GPU will get result as below:

TobyRoseman commented 10 months ago

If a Core ML model loaded with different compute units gives significantly different results, that is an issue with the Core ML Framework, not an issue with conversion.

Please use the Feedback Assistant to submit this framework bug. Don't worry about including the PyTorch model. Just include the Core ML model along with code to demonstrate the difference.

Since this issue can not be fixed in the coremltools GitHub repository, I'm going to close this issue.

apple / coremltools