Mikubill / naifu

Train generative models with pytorch lightning
MIT License
284 stars 36 forks source link

KeyError: 'time_embedding.linear_1.weight' #20

Closed biasnhbi closed 1 year ago

biasnhbi commented 1 year ago

I used naifu-diffusion to train the model, and the trained model was converted with convert_to_sd.py, and this error occurred. xformers==0.0.20,diffusers == 0.17.0 ,torch==2.0.1, diffusers ==0.10.2 cannot be used


Using 16bit native Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs

Loading captions: 3it [00:00, 2874.78it/s]
BucketManager initialized with base_res = [512, 512], max_size = [768, 512]
Loading resolutions: 3it [00:00, 72.47it/s]
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
Using scaled LR: 3e-06

  | Name         | Type                 | Params
------------------------------------------------------
0 | unet         | UNet2DConditionModel | 859 M 
1 | vae          | AutoencoderKL        | 83.7 M
2 | text_encoder | CLIPTextModel        | 123 M 
------------------------------------------------------
859 M     Trainable params
206 M     Non-trainable params
1.1 B     Total params
2,132.471 Total estimated model params size (MB)
/home/ubuntu/miniconda3/envs/nd/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:224: PossibleUserWarning: The dataloader, train_dataloader, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 8 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
  rank_zero_warn(
/home/ubuntu/miniconda3/envs/nd/lib/python3.10/site-packages/lightning_lite/utilities/data.py:63: UserWarning: Your `IterableDataset` has `__len__` defined. In combination with multi-process data loading (when num_workers > 1), `__len__` could be inaccurate if each worker is not configured independently to avoid having duplicate data.
  rank_zero_warn(
Epoch 0:   0%|                                                          | 0/3 [00:00<?, ?it/s]/home/ubuntu/miniconda3/envs/nd/lib/python3.10/site-packages/xformers/ops/fmha/flash.py:339: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  and inp.query.storage().data_ptr() == inp.key.storage().data_ptr()
Epoch 0: 100%|██████████████████████████████████████| 3/3 [00:04<00:00,  1.38s/it, loss=0.114]`Trainer.fit` stopped: `max_epochs=1` reached.
Epoch 0: 100%|██████████████████████████████████████| 3/3 [00:27<00:00,  9.29s/it, loss=0.114]
Traceback (most recent call last):
  File "/home/ubuntu/naifu-diffusion/scripts/convert_to_sd.py", line 345, in <module>
    unet_state_dict = convert_unet_state_dict(unet_state_dict, is_v2)
  File "/home/ubuntu/naifu-diffusion/scripts/convert_to_sd.py", line 107, in convert_unet_state_dict
    new_state_dict = {v: unet_state_dict[k] for k, v in mapping.items()}
  File "/home/ubuntu/naifu-diffusion/scripts/convert_to_sd.py", line 107, in <dictcomp>
    new_state_dict = {v: unet_state_dict[k] for k, v in mapping.items()}
KeyError: 'time_embedding.linear_1.weight'
Mikubill commented 1 year ago

Could you provide the commit hash? And try these steps may help:

  1. Upgrade diffusers - pip install -U diffusers
  2. Run git pull - update nd to the latest version
  3. Use the checkpoint directly without conversion

On Tue, Jul 25, 2023 at 17:49 biasnhbi @.***> wrote:

I used naifu-diffusion to train the model, and the trained model was converted with convert_to_sd.py, and this error occurred. xformers==0.0.20,diffusers == 0.17.0 ,torch==2.0.1, diffusers ==0.10.2 cannot be used

Using 16bit native Automatic Mixed Precision (AMP) GPU available: True (cuda), used: True TPU available: False, using: 0 TPU cores IPU available: False, using: 0 IPUs HPU available: False, using: 0 HPUs

Loading captions: 3it [00:00, 2874.78it/s] BucketManager initialized with base_res = [512, 512], max_size = [768, 512] Loading resolutions: 3it [00:00, 72.47it/s] LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0] Using scaled LR: 3e-06

| Name | Type | Params

0 | unet | UNet2DConditionModel | 859 M 1 | vae | AutoencoderKL | 83.7 M 2 | text_encoder | CLIPTextModel | 123 M

859 M Trainable params 206 M Non-trainable params 1.1 B Total params 2,132.471 Total estimated model params size (MB) /home/ubuntu/miniconda3/envs/nd/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:224: PossibleUserWarning: The dataloader, train_dataloader, does not have many workers which may be a bottleneck. Consider increasing the value of the num_workers argument(try 8 which is the number of cpus on this machine) in theDataLoaderinit to improve performance. rank_zero_warn( /home/ubuntu/miniconda3/envs/nd/lib/python3.10/site-packages/lightning_lite/utilities/data.py:63: UserWarning: YourIterableDatasethaslendefined. In combination with multi-process data loading (when num_workers > 1),lencould be inaccurate if each worker is not configured independently to avoid having duplicate data. rank_zero_warn( Epoch 0: 0%| | 0/3 [00:00<?, ?it/s]/home/ubuntu/miniconda3/envs/nd/lib/python3.10/site-packages/xformers/ops/fmha/flash.py:339: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() and inp.query.storage().data_ptr() == inp.key.storage().data_ptr() Epoch 0: 100%|██████████████████████████████████████| 3/3 [00:04<00:00, 1.38s/it, loss=0.114]Trainer.fitstopped:max_epochs=1` reached. Epoch 0: 100%|██████████████████████████████████████| 3/3 [00:27<00:00, 9.29s/it, loss=0.114] Traceback (most recent call last): File "/home/ubuntu/naifu-diffusion/scripts/convert_to_sd.py", line 345, in unet_state_dict = convert_unet_state_dict(unet_state_dict, is_v2) File "/home/ubuntu/naifu-diffusion/scripts/convert_to_sd.py", line 107, in convert_unet_state_dict new_state_dict = {v: unet_state_dict[k] for k, v in mapping.items()} File "/home/ubuntu/naifu-diffusion/scripts/convert_to_sd.py", line 107, in new_state_dict = {v: unet_state_dict[k] for k, v in mapping.items()} KeyError: 'time_embedding.linear_1.weight'

— Reply to this email directly, view it on GitHub https://github.com/Mikubill/naifu-diffusion/issues/20, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHOMTSS3ZAB63IGJMOOETVTXR6CCNANCNFSM6AAAAAA2WXJTOY . You are receiving this because you are subscribed to this thread.Message ID: @.***>

biasnhbi commented 1 year ago

Yes, webui can use checkpoint directly