d8ahazard / sd_dreambooth_extension

Other
1.86k stars 281 forks source link

[Bug]: Some tensors share memory error when creating a model from stable diffusion 2.1 #1348

Closed glozachmeur closed 11 months ago

glozachmeur commented 11 months ago

Is there an existing issue for this?

What happened?

Hi, I have a " Some tensors share memory" error, but only when I am trying to create a model from stable diffusion (v2-1_768-ema-pruned.safetensors)

Steps to reproduce the problem

  1. Go to dreambooth tab
  2. Try to create a model from stable diffusion 2.1 (768)
  3. The model is not created and there is a tensor share memory error

Commit and libraries

Initializing Dreambooth Dreambooth revision: cf086c536b141fc522ff11f6cffc8b7b12da04b9 Successfully installed fastapi-0.94.1 [+] xformers version 0.0.20 installed. [+] torch version 2.0.1+cu118 installed. [+] torchvision version 0.15.2+cu118 installed. [+] accelerate version 0.21.0 installed. [+] diffusers version 0.19.3 installed. [+] transformers version 4.30.2 installed. [+] bitsandbytes version 0.35.4 installed.

Command Line Arguments

none

Console logs

To create a public link, set `share=True` in `launch()`.
Startup time: 15.3s (prepare environment: 7.4s, import torch: 1.1s, import gradio: 0.3s, setup paths: 0.6s, initialize shared: 0.1s, other imports: 0.3s, load scripts: 4.1s, create ui: 0.3s, gradio launch: 1.1s).
Extracting config from /home/guillaume/dev/stable-diffusion-webui/extensions/sd_dreambooth_extension/dreambooth/../configs/v1-training-default.yaml
Extracting checkpoint from /home/guillaume/dev/stable-diffusion-webui/models/Stable-diffusion/dreamlike-photoreal-2.0.safetensors
Restored system models.
Duration: 00:00:09
Updating scheduler name to: DDIM
Extracting config from /home/guillaume/dev/stable-diffusion-webui/extensions/sd_dreambooth_extension/dreambooth/../configs/v1-training-default.yaml
Extracting checkpoint from /home/guillaume/dev/stable-diffusion-webui/models/Stable-diffusion/v2-1_768-ema-pruned.safetensors
Couldn't save the pipe
Traceback (most recent call last):
  File "/home/guillaume/dev/stable-diffusion-webui/extensions/sd_dreambooth_extension/dreambooth/sd_to_diff.py", line 187, in extract_checkpoint
    pipe.save_pretrained(dump_path, safe_serialization=True)
  File "/home/guillaume/dev/stable-diffusion-webui/venv/lib/python3.10/site-packages/diffusers/pipelines/pipeline_utils.py", line 640, in save_pretrained
    save_method(os.path.join(save_directory, pipeline_component_name), **save_kwargs)
  File "/home/guillaume/dev/stable-diffusion-webui/venv/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1847, in save_pretrained
    safe_save_file(shard, os.path.join(save_directory, shard_file), metadata={"format": "pt"})
  File "/home/guillaume/dev/stable-diffusion-webui/venv/lib/python3.10/site-packages/safetensors/torch.py", line 232, in save_file
    serialize_file(_flatten(tensors), filename, metadata=metadata)
  File "/home/guillaume/dev/stable-diffusion-webui/venv/lib/python3.10/site-packages/safetensors/torch.py", line 394, in _flatten
    raise RuntimeError(
RuntimeError: 
            Some tensors share memory, this will lead to duplicate memory on disk and potential differences when loading them again: [{'text_model.encoder.layers.0.self_attn.k_proj.bias', 'text_model.encoder.layers.0.self_attn.k_proj.weight', 'text_model.embeddings.token_embedding.weight', 'text_model.embeddings.position_embedding.weight', 'text_model.encoder.layers.0.mlp.fc2.weight', 'text_model.encoder.layers.0.mlp.fc1.bias', 'text_model.encoder.layers.0.mlp.fc1.weight'}].
            A potential way to correctly save your model is to use `save_model`.
            More information at https://huggingface.co/docs/safetensors/torch_shared_tensors

Unable to extract checkpoint!
Duration: 00:00:03

Additional information

No response

Ciduss commented 11 months ago

I can confirm i have the exact same issue with basically ALL 2.1 768 checkpoints. I can not create a new training checkpoint, UNLESS i only start with an SD1.5 Checkpoint model. (Also SDXL does not work but the error is different) image

I have not tried 2.1 512x or 2.0 checkpoints.

Please update this error. None of the issues in the past have a solution that works for this. Been stuck at it with trail and error for over 2 weeks and I'm losing my sanity.

github-actions[bot] commented 11 months ago

This issue is stale because it has been open 5 days with no activity. Remove stale label or comment or this will be closed in 5 days