argmaxinc / DiffusionKit

On-device Inference of Diffusion Models for Apple Silicon
MIT License
450 stars 22 forks source link

Issues on the step "Converting Models from PyTorch to Core ML" #6

Closed HaskDev0 closed 3 months ago

HaskDev0 commented 3 months ago

While going through the steps listed in the description, the "Converting Models from PyTorch to Core ML" section fails on the Step 2, i.e., when I type:

python -m tests.torch2coreml.test_mmdit --sd3-ckpt-path ../sd3_medium_incl_clips.safetensors --model-version '8b' -o .../generatedResponsesWithStableDiffusion3Medium --latent-size 128

I get the following error:

======================================================================
ERROR: setUpClass (__main__.TestSD3MMDiT)
----------------------------------------------------------------------
Traceback (most recent call last):
  File ".../DiffusionKit-main/tests/torch2coreml/test_mmdit.py", line 69, in setUpClass
    _load_mmdit_weights(cls.test_torch_model, TEST_SD3_CKPT_PATH)
  File ".../DiffusionKit-main/python/src/torch/model_io.py", line 83, in _load_mmdit_weights
    raise ValueError(
ValueError: Total number of parameters in state_dict (2084877376) does not match the number of parameters in the module (8146086208)

----------------------------------------------------------------------
Ran 0 tests in 24.105s

FAILED (errors=1)

I see that there is a discrepancy in the number of module parameters, but the file I downloaded from HuggingFace is "sd3_medium_incl_clips.safetensors".

I'm using macOS Sonoma 14.3.1, Python 3.9.19

Are there any workarounds and suggestions for the issue? Thank you in advance.

atiorh commented 3 months ago

Hi @HaskDev0, please use the file without incl_clips: https://huggingface.co/stabilityai/stable-diffusion-3-medium/blob/main/sd3_medium.safetensors.

HaskDev0 commented 3 months ago

Hi @HaskDev0, please use the file without incl_clips: https://huggingface.co/stabilityai/stable-diffusion-3-medium/blob/main/sd3_medium.safetensors.

Hi @atiorh , thanks for suggestion, but I tried the model without the _incl_clips in the meantime and still getting the same error...

ZachNagengast commented 3 months ago

@HaskDev0 I believe the issue is targeting 8b instead of 2b (medium) - here is a script that should work better

python -m tests.torch2coreml.test_mmdit --sd3-ckpt-path stabilityai/stable-diffusion-3-medium --model-version '2b' -o .../generatedResponsesWithStableDiffusion3Medium --latent-size 128

HaskDev0 commented 3 months ago

@ZachNagengast , thanks for the suggestion, it does pass the error that I had before! Although it raises other errors (and one question)...

1) Do I understand right that there is no 8b model version available?

2) Now the error I get is the following, which specifically says my numpy version is the wrong one:

.../DiffusionKit-main/python/src/torch/mmdit.py:206: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  -torch.log(torch.tensor(self.config.max_period).to(t.dtype))
.../DiffusionKit-main/python/src/torch/mmdit.py:160: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert h <= self.max_hw and w <= self.max_hw
E
======================================================================
ERROR: setUpClass (__main__.TestSD3MMDiT)
----------------------------------------------------------------------
Traceback (most recent call last):
  File ".../DiffusionKit-main/tests/torch2coreml/test_mmdit.py", line 80, in setUpClass
    super().setUpClass()
  File ".../lib/python3.9/site-packages/argmaxtools/test_utils.py", line 55, in setUpClass
    cls.test_coreml_model = _create_coreml_model(
  File ".../lib/python3.9/site-packages/argmaxtools/test_utils.py", line 411, in _create_coreml_model
    inputs=[
  File ".../lib/python3.9/site-packages/argmaxtools/test_utils.py", line 412, in <listcomp>
    ct.TensorType(k, shape=v.shape, dtype=v.dtype)
  File ".../lib/python3.9/site-packages/coremltools/converters/mil/input_types.py", line 229, in __init__
    self.dtype = numpy_type_to_builtin_type(dtype)
  File ".../lib/python3.9/site-packages/coremltools/converters/mil/mil/types/type_mapping.py", line 363, in numpy_type_to_builtin_type
    if np.issubclass_(type(nptype), np.dtype):
  File ".../lib/python3.9/site-packages/numpy/__init__.py", line 397, in __getattr__
    raise AttributeError(
AttributeError: `np.issubclass_` was removed in the NumPy 2.0 release. Use `issubclass` builtin instead.

Should I reinstall numpy version or is there a long-term solution for future versions?

ZachNagengast commented 3 months ago

Yes this sounds like a dependency issue. Are you using the conda environment setup via https://github.com/argmaxinc/DiffusionKit?tab=readme-ov-file#python-environment-setup?

HaskDev0 commented 3 months ago

@ZachNagengast , I was using venv for creating virtual environments, but I have resolved the issue by downgrading numpy version.

Finally, I was able to convert PyTorch models co CoreML, but as soon as I use diffusionkit-cli afterwards, it raises an error regarding HF access... Why is it trying to do so? I have the model locally and converted everything to CoreML through the steps provided?

Thank you in advance.

ZachNagengast commented 3 months ago

Ah great! The error is likely due to the gated access due to the StabilityAI license. All you need to do is go to the model page here https://huggingface.co/stabilityai/stable-diffusion-3-medium and accept the terms.

HaskDev0 commented 3 months ago

@ZachNagengast , hmm, but if I already accepted the license, downloaded the .safetensors model, used it for converting to CoreML, why is it still trying to access HF? All the models are local now. Or how can I use CoreML generated models?

ZachNagengast commented 3 months ago

Yes just noticed something - if you are using a fine-grained token, you will need to also add the permission to read gated repos on the tokens page: https://huggingface.co/settings/tokens

image

Just updated the readme as well. You can also use --local-ckpt <path> for the local sd3_medium.safetensors

HaskDev0 commented 3 months ago

@ZachNagengast , thank you! I recloned the repository to update the files, now with --local-ckpt argument it loads indeed (although it still in process), but I am confused a bit since it just redownloads all the .safetensors files from HF. Why does it do so? And where does it save all these big files? I cannot find them...

ZachNagengast commented 3 months ago

Tagging @arda-argmax who may be able to answer better

arda-argmax commented 3 months ago

@HaskDev0 , default save directory for huggingface is ~/.cache/huggingface/hub/. You can probably find the saved .safetensors file in ~/.cache/huggingface/hub/models--stabilityai--stable-diffusion-3-medium/. Can you write the full cli command that triggers the redownload?

HaskDev0 commented 3 months ago

@arda-argmax , thank you for the folder's info! Found it indeed. Maybe I stated it wrongly, but I meant that when I first ran the following command:

diffusionkit-cli --prompt "Sunset" --steps 64 --output-path .../ImageGenPrompt.png --t5 --height 1024 --width 1024 --local-ckpt .../Stable-Diffusion-3-medium/sd3_medium.safetensors --no-low-memory-mode

It triggered the download of other parts from HF to .cache directory. Succeeding calls haven't triggered the redownload, but I was surprised that even having .safetensors still triggered the downloading of some other heavy things. Or how should this be done in order to control the download of potentially "missing" parts?

arda-argmax commented 3 months ago

@HaskDev0 , I couldn't reproduce this issue on my end. Could you do the following so that I can diagnose better:

Other models like CLIP are required for running the pipeline, so they will be downloaded from HF to your cache. You can only use your local MMDiT checkpoint (e.g. sd3_medium.safetensors) for the --local-ckpt arg. Best, Arda

HaskDev0 commented 3 months ago

@arda-argmax , I will try, but I think there is a typo (or just a different naming, but I didn't have models--stabilityai--stable-diffusion-3-medium, but something similar instead.

After typing the above in Terminal I got the following:

clip_l/config.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 565/565 [00:00<00:00, 304kB/s]
model.fp16.safetensors: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 246M/246M [00:22<00:00, 11.0MB/s]
tokenizer_l/vocab.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 1.06M/1.06M [00:00<00:00, 2.32MB/s]
tokenizer_l/merges.txt: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 525k/525k [00:00<00:00, 1.67MB/s]
clip_g/config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 575/575 [00:00<00:00, 1.26MB/s]
model.fp16.safetensors: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 1.39G/1.39G [02:00<00:00, 11.5MB/s]
.../DiffusionKit.MLX/lib/python3.9/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
t5xxl.safetensors: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 9.79G/9.79G [14:46<00:00, 11.0MB/s]
INFO:python.src.mlx.scripts.generate_images:Output image resolution will be 1024x1024
INFO:python.src.mlx:Pre text encoding peak memory: 0.0GB
INFO:python.src.mlx:Pre text encoding active memory: 0.0GB
INFO:python.src.mlx:Post text encoding peak memory: 20.816GB
INFO:python.src.mlx:Post text encoding active memory: 20.792GB
INFO:python.src.mlx:Text encoding time: 4.139s
INFO:python.src.mlx:Pre denoise peak memory: 0.0GB
INFO:python.src.mlx:Pre denoise active memory: 20.792GB
INFO:python.src.mlx:Seed: 1719351394
 17%|█████████████████████▏                                                                                                     | 11/64 [00:28<02:06,  2.38s/it]

As I understand from above, it was expected that it has to download additional models anyway. I was then questioned if it is possible download them yourself instead (aka running with sd3_medium_incl_clips_t5xxlfp16.safetensors), but it thrown an error.

Best, Alexey

arda-argmax commented 3 months ago

Yes, it is expected to download the CLIP models and tokenizers (and T5XXL if you want to use T5). Unfortunately, we only support sd3_medium.safetensors for local checkpoints, not the combined checkpoints like sd3_medium_incl_clips_t5xxlfp16.safetensors.