apple / coremltools

Core ML tools contain supporting tools for Core ML model conversion, editing, and validation.
https://coremltools.readme.io
BSD 3-Clause "New" or "Revised" License
4.46k stars 647 forks source link

Quantized Models Chunking into unequal sizes #2320

Open nighting0le01 opened 3 months ago

nighting0le01 commented 3 months ago

🐞Describing the bug

with reference to this issue https://github.com/apple/ml-stable-diffusion/issues/353, i used the bisect_model() function to split a quantized model into 2 chunks, i tried with 7.1 and 7.0 with reference to this file:https://github.com/apple/ml-stable-diffusion/blob/cf16df8207dfcba685a9391bad04f7402ea87b73/python_coreml_stable_diffusion/chunk_mlprogram.py#L123 , but was facing same issue.

 prog = _load_prog_from_mlmodel(model)

# Compute the incision point by bisecting the program based on weights size
op_idx, first_chunk_weights_size, total_weights_size = _get_op_idx_split_location(
    prog)
print(f"First  chunk size = {first_chunk_weights_size:.2f} MB") # 152.67 MB
print(f"Second chunk size = {total_weights_size - first_chunk_weights_size:.2f} MB") #0.42 MB
print(index=587/2720)
prog_chunk1 = _make_first_chunk_prog(f"index={op_idx}/{len(main_block.operations)") # 587/3000
prog_chunk2 = _make_second_chunk_prog(_load_prog_from_mlmodel(model), op_idx)

System environment (please complete the following information):

cc: @aseemw

jakesabathia2 commented 3 months ago

@nighting0le01 would you mind providing a standalone script for us to reproduce?

nighting0le01 commented 3 months ago

hi @jakesabathia2 !! here is the code to reproduce, coremltools version 7.01, i know with 8.0b2 the chunking has moved to CoreMLtools but i think it has the same issue when chunking a quantized or palletized model

Model is simple MobileNet that can be downloaded from coremltools tutorial:https://apple.github.io/coremltools/docs-guides/source/opt-palettization-perf.html#:~:text=0.47-,MobileNetv2%2D1.0,-4%20bit

import coremltools as ct
from python_coreml_stable_diffusion.chunk_mlprogram import (
    _load_prog_from_mlmodel,
    _get_op_idx_split_location,
    _make_second_chunk_prog,
    _make_first_chunk_prog,
)
# link to get model:https://apple.github.io/coremltools/docs-guides/source/opt-palettization-perf.html#:~:text=0.47-,MobileNetv2%2D1.0,-4%20bit
model = ct.models.MLModel('MobileNetV2Alpha1ScalarPalettization4Bit.mlpackage')
prog = _load_prog_from_mlmodel(model)
# Load the MIL Program from MLModel
prog = _load_prog_from_mlmodel(model)

# Compute the incision point by bisecting the program based on weights size
op_idx, first_chunk_weights_size, total_weights_size = _get_op_idx_split_location(
    prog)
main_block = prog.functions["main"]
incision_op = main_block.operations[op_idx]

print(f"op_idx = {op_idx}")
print(f"First  chunk size = {first_chunk_weights_size:.2f} MB")
print(f"Second chunk size = {total_weights_size - first_chunk_weights_size:.2f} MB")
INFO:python_coreml_stable_diffusion.chunk_mlprogram:Loading MLModel object into a MIL Program object (including the weights)..
INFO:python_coreml_stable_diffusion.chunk_mlprogram:Program loaded in 0.1 seconds
INFO:python_coreml_stable_diffusion.chunk_mlprogram:Loading MLModel object into a MIL Program object (including the weights)..
INFO:python_coreml_stable_diffusion.chunk_mlprogram:Program loaded in 0.1 seconds
op_idx = 187
First  chunk size = 1.68 MB
Second chunk size = 0.15 MB
nighting0le01 commented 3 months ago

Hi @jakesabathia2 , below is with 8.0b2 version of CoreMLtools, cc @aseemw :https://github.com/apple/ml-stable-diffusion/issues/353

import coremltools as ct
# link to get model:https://apple.github.io/coremltools/docs-guides/source/opt-palettization-perf.html#:~:text=0.47-,MobileNetv2%2D1.0,-4%20bit
model = ct.models.MLModel('MobileNetV2Alpha1ScalarPalettization4Bit.mlpackage')
prog = _load_prog_from_mlmodel(model)
# Load the MIL Program from MLModel
prog = _load_prog_from_mlmodel(model)
output_dir = "./output/"
model_path = './MobileNetV2Alpha1ScalarPalettization4Bit.mlpackage'
# Compute the incision point by bisecting the program based on weights size
ct.models.utils.bisect_model(
    model_path,
    output_dir,
    merge_chunks_to_pipeline=False,
)

print(f"First  chunk size = {first_chunk_weights_size:.2f} MB")
print(f"Second chunk size = {total_weights_size - first_chunk_weights_size:.2f} MB")
nighting0le01 commented 3 months ago

@jakesabathia2 @DawerG @aseemw @atiorh @TobyRoseman any help appreciated thank you 🙏