Mixed Palettization of SD 1.5 LCM model

indoflaven commented 4 months ago

I'm trying to create a mixed palette version of a SD 1.5 LCM model. Specifically I've using /Lykon/dreamshaper-8-lcm from huggingface. I tried using the pregenerated recipe for SD 1.5 but got the following error when running the 4.85bit recipe:

File "/Users/michaelhein/Documents/GitHub/ml-stable-diffusion/python_coreml_stable_diffusion/mixed_bit_compression_apply.py", line 71, in main
    assert(pdist.min() < 0.01)
AssertionError

So I tried to create my own recipe and I get the following error:

 File "/opt/miniconda3/envs/coreml_stable_diffusion/lib/python3.8/site-packages/diffusers/models/attention_processor.py", line 1231, in __call__
    hidden_states = F.scaled_dot_product_attention(
RuntimeError: Invalid buffer size: 20.25 GB

Perhaps I just need a system to more memory to complete this task (using an M1 Macbook Air with only 8gigs RAM), but thought I'd post here to if there's something I'm doing wrong.

atiorh commented 4 months ago

Thanks for the report @indoflaven! Yes, the matching of recipe results with the weights of a non-base model seems to be broken. I (not Apple) am working on a fix since I wrote this part of the code and we are actively working on an improved version of it that will include this fix. I will ping you once we release it but Apple might fix it before I do. cc: @aseemw

indoflaven commented 4 months ago

@atiorh Thanks! Another question about mixed-bit palettization. Let's say I'm compressing a model like SD-Turbo which has a Unet, Vae Decoder, and Text Encoder. Can I only used mixed-bit palettization on the Unet? Or can I also run it on the Vae Decoder and Text Encoder? If not can I match a the mixed-bit Unet with a standard 6bit Vae and text encoder?

Also, using the mixed-bit palettization produces a mlpackage. How can I complile this in the same way using --bundle-resources-for-swift-cli compiles everything for the standard palettization.

Thanks!

atiorh commented 4 months ago

@atiorh Thanks! Another question about mixed-bit palettization. Let's say I'm compressing a model like SD-Turbo which has a Unet, Vae Decoder, and Text Encoder. Can I only used mixed-bit palettization on the Unet? Or can I also run it on the Vae Decoder and Text Encoder?

The implementation of MBP in this repo is tied to the Unet only. If you want to get your hands dirty through an example, check out how whisperkittools uses a generic implementation of MBP:

argmaxtools.compress.palettize.Palettizer can be extended to initialize the model, sample inputs and the divergence function
argmaxtools.test_utils.CoreMLPalettizerTestsMixin lets you write a unit test that runs the entire MBP pipeline using the Palettizer class you implemented above.

If not can I match a the mixed-bit Unet with a standard 6bit Vae and text encoder?

Yes, I recommend using 6-bit VAE + TextEncoder in conjunction with an MBP Unet. You just need to create them separately for now.

Also, using the mixed-bit palettization produces a mlpackage. How can I complile this in the same way using --bundle-resources-for-swift-cli compiles everything for the standard palettization.

coremlcompiler compile <path-to-mlpackage> .

indoflaven commented 4 months ago

Thanks again. For anyone else coming to this post in the future xcrun coremlcompiler compile <path-to-mlpackage> . worked for me.

apple / ml-stable-diffusion

Mixed Palettization of SD 1.5 LCM model #326