Encountering "Tensor Shape Mismatch" Error during Training

yvfengZhong commented 10 months ago

Thank you for your contributions to the aigc community. I've encountered an issue while training using the train_mask_motion.yaml configuration file. I modified the training and testing datasets in the configuration file and initiated training using the command: python train.py --config ./example/train_mask_motion.yaml However, after training for 32 iterations, I encountered the following error in https://github.com/alibaba/animate-anything/blob/9e6098abcea894155eaab17c1f5573d0d11c3410/models/unet_3d_blocks.py#L41-L52:

I find it puzzling why a tensor shape mismatch error is occurring midway through training. I would appreciate any insights or guidance you can provide to help me understand and resolve this issue.

Thank you once again for your assistance!

daizuozhuo commented 10 months ago

could you paste the full error stack trace?

yvfengZhong commented 10 months ago

The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling transformers.utils.move_cache(). 0it [00:00, ?it/s] Initializing the conversion map home/miniconda3/envs/torch200/lib/python3.10/site-packages/accelerate/accelerator.py:371: UserWarning: log_with=tensorboard was passed but no supported trackers are currently installed. warnings.warn(f"log_with={log_with} was passed but no supported trackers are currently installed.") 01/09/2024 10:59:06 - INFO - main - Distributed environment: NO Num processes: 1 Process index: 0 Local process index: 0 Device: cuda

Mixed precision type: fp16

{'variance_type'} was not found in config. Values will be initialized to default values. motion mask True, motion_strength True All model checkpoint weights were used when initializing UNet3DConditionModel.

All the weights of UNet3DConditionModel were initialized from the model checkpoint at output/latent/animate_anything_512_v1.02. If your task is similar to the task the model of the checkpoint was trained on, you can already use UNet3DConditionModel for predictions without further training. 33 Attention layers using Scaled Dot Product Attention. Loading JSON from home/Video-BLIP2-Preprocessor/train_data/my_videos.json Non-existant JSON path. Skipping. Non-existant JSON path. Skipping. Could not process extra train datasets due to an error : [Errno 2] No such file or directory: '/webvid/webvid/data/40K.json' 01/09/2024 10:59:35 - INFO - main - Running training 01/09/2024 10:59:35 - INFO - main - Num examples = 260 01/09/2024 10:59:35 - INFO - main - Num Epochs = 152 01/09/2024 10:59:35 - INFO - main - Instantaneous batch size per device = 8 01/09/2024 10:59:35 - INFO - main - Total train batch size (w. parallel, distributed & accumulation) = 8 01/09/2024 10:59:35 - INFO - main - Gradient Accumulation steps = 1 01/09/2024 10:59:35 - INFO - main - Total optimization steps = 5000 Steps: 0%| | 0/5000 [00:00<?, ?it/s]not trainable [] 4100 params have been unfrozen for training. Steps: 1%|▉ | 32/5000 [15:28<35:07:07, 25.45s/it, lr=5e-6, step_loss=0.197]Traceback (most recent call last): File "home/animate-anything/train.py", line 1188, in main(args_dict) File "home/animate-anything/train.py", line 819, in main loss, latents = finetune_unet(batch, use_offset_noise, cache_latents, vae, File "home/animate-anything/train.py", line 1020, in finetune_unet model_pred = unet(noisy_latents, timesteps, condition_latent=condition_latent, mask=mask, File "home/miniconda3/envs/torch200/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "home/miniconda3/envs/torch200/lib/python3.10/site-packages/accelerate/utils/operations.py", line 581, in forward return model_forward(args, kwargs) File "home/miniconda3/envs/torch200/lib/python3.10/site-packages/accelerate/utils/operations.py", line 569, in call return convert_to_fp32(self.model_forward(*args, kwargs)) File "home/miniconda3/envs/torch200/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 14, in decorate_autocast return func(*args, *kwargs) File "home/animate-anything/models/unet_3d_condition_mask.py", line 445, in forward sample, res_samples = downsample_block( File "home/miniconda3/envs/torch200/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "home/animate-anything/models/unet_3d_blocks.py", line 501, in forward hidden_states = cross_attn_g_c( File "home/animate-anything/models/unet_3d_blocks.py", line 111, in cross_attn_g_c for idx in [2,3,0,1]: hidden_states = ordered_g_c(idx) File "home/animate-anything/models/unet_3d_blocks.py", line 88, in ordered_g_c if idx == 0: return g_c(custom_checkpoint(attn, mode='attn'), File "home/miniconda3/envs/torch200/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 251, in checkpoint return _checkpoint_without_reentrant( File "home/miniconda3/envs/torch200/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 432, in _checkpoint_without_reentrant output = function(*args, kwargs) File "home/animate-anything/models/unet_3d_blocks.py", line 47, in custom_forward inputs = module( File "home/miniconda3/envs/torch200/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "home/miniconda3/envs/torch200/lib/python3.10/site-packages/diffusers/models/transformer_2d.py", line 392, in forward hidden_states = block( File "home/miniconda3/envs/torch200/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "home/miniconda3/envs/torch200/lib/python3.10/site-packages/diffusers/models/attention.py", line 329, in forward hidden_states = attn_output + hidden_states RuntimeError: The size of tensor a (2048) must match the size of tensor b (4096) at non-singleton dimension 1 Steps: 1%|▉ | 32/5000 [15:30<40:06:43, 29.07s/it, lr=5e-6, step_loss=0.197]

yvfengZhong commented 10 months ago

I extracted 10 videos from the WebVid10M dataset to create a demo dataset, and I processed them using the Video-BLIP2-Preprocessor. It's worth mentioning that I did not specify the clip_frame_data parameter during the Video-BLIP2-Preprocessor processing, but I did specify the video_blip parameter in the animate-anything module.

After processing, I've encountered unexpected behavior during training. I'm wondering if the issue might be related to the Video-BLIP2-Preprocessor processing or if there's something else I might be overlooking.

daizuozhuo commented 10 months ago

It seems that something is wrong in your video dataset. Maybe some training videos are corrupt. I suggest you to print the training video paths during training and find the corrupt video.

yvfengZhong commented 9 months ago

I have successfully pinpointed the underlying issue. In my scenario, I employed a restricted subset of the WebVid10M dataset, consisting of only 10 videos, to construct a demonstration dataset. Therefore, when reaching the 32nd iteration, the data loader attempted to read the last 2 samples of the dataset. This posed a problem since the shape of the uncond_input tensor was tied to train_batch_size (configured as 8 in the settings file), resulting in a tensor shape mismatch. https://github.com/alibaba/animate-anything/blob/43c7e1bb4ecc79f9477edb834b45d5eb5aedeedb/train.py#L783-L784

To rectify this issue, I configured the drop_last parameter of the train_dataloader to True. This resolution proved effective for my particular case. https://github.com/alibaba/animate-anything/blob/43c7e1bb4ecc79f9477edb834b45d5eb5aedeedb/train.py#L666-L671

alibaba / animate-anything

Encountering "Tensor Shape Mismatch" Error during Training #24