i2v multiresolution finetuning

System Info / 系統信息

diffusers: git+https://github.com/huggingface/diffusers
Information / 问题信息

[ ] The official example scripts / 官方的示例脚本
[ ] My own modified scripts / 我自己修改的脚本和任务
Reproduction / 复现过程

hi i have tried the i2v multiresolution finetuning with latest code but got the error
Traceback (most recent call last):
  File "/workspace/i2v/cogvideox-factory/training/cogvideox_image_to_video_lora.py", line 1004, in <module>
    main(args)
  File "/workspace/i2v/cogvideox-factory/training/cogvideox_image_to_video_lora.py", line 803, in main
    model_output = transformer(
  File "/root/miniconda3/envs/i2v/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/miniconda3/envs/i2v/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/envs/i2v/lib/python3.10/site-packages/accelerate/utils/operations.py", line 823, in forward
    return model_forward(*args, **kwargs)
  File "/root/miniconda3/envs/i2v/lib/python3.10/site-packages/accelerate/utils/operations.py", line 811, in __call__
    return convert_to_fp32(self.model_forward(*args, **kwargs))
  File "/root/miniconda3/envs/i2v/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 43, in decorate_autocast
    return func(*args, **kwargs)
  File "/root/miniconda3/envs/i2v/lib/python3.10/site-packages/diffusers/models/transformers/cogvideox_transformer_3d.py", line 446, in forward
    hidden_states = self.patch_embed(encoder_hidden_states, hidden_states)
  File "/root/miniconda3/envs/i2v/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/miniconda3/envs/i2v/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/envs/i2v/lib/python3.10/site-packages/diffusers/models/embeddings.py", line 423, in forward
    raise ValueError(
ValueError: It is currently not possible to generate videos at a different resolution that the defaults. This should only be the case with 'THUDM/CogVideoX-5b-I2V'.If you think this is incorrect, please open an issue at https://github.com/huggingface/diffusers/issues.
I1113 09:45:55.953000 140250185783104 torch/_dynamo/utils.py:335] TorchDynamo compilation metrics:
I1113 09:45:55.953000 140250185783104 torch/_dynamo/utils.py:335] Function, Runtimes (s)
V1113 09:45:55.953000 140250185783104 torch/fx/experimental/symbolic_shapes.py:116] lru_cache_stats constrain_symbol_range: CacheInfo(hits=0, misses=0, maxsize=None, currsize=0)
V1113 09:45:55.953000 140250185783104 torch/fx/experimental/symbolic_shapes.py:116] lru_cache_stats evaluate_expr: CacheInfo(hits=0, misses=0, maxsize=256, currsize=0)
V1113 09:45:55.953000 140250185783104 torch/fx/experimental/symbolic_shapes.py:116] lru_cache_stats _simplify_floor_div: CacheInfo(hits=0, misses=0, maxsize=None, currsize=0)
V1113 09:45:55.953000 140250185783104 torch/fx/experimental/symbolic_shapes.py:116] lru_cache_stats _maybe_guard_rel: CacheInfo(hits=0, misses=0, maxsize=256, currsize=0)
V1113 09:45:55.953000 140250185783104 torch/fx/experimental/symbolic_shapes.py:116] lru_cache_stats _find: CacheInfo(hits=0, misses=0, maxsize=None, currsize=0)
V1113 09:45:55.953000 140250185783104 torch/fx/experimental/symbolic_shapes.py:116] lru_cache_stats has_hint: CacheInfo(hits=0, misses=0, maxsize=256, currsize=0)
V1113 09:45:55.953000 140250185783104 torch/fx/experimental/symbolic_shapes.py:116] lru_cache_stats size_hint: CacheInfo(hits=0, misses=0, maxsize=256, currsize=0)
V1113 09:45:55.953000 140250185783104 torch/fx/experimental/symbolic_shapes.py:116] lru_cache_stats simplify: CacheInfo(hits=0, misses=0, maxsize=None, currsize=0)
V1113 09:45:55.953000 140250185783104 torch/fx/experimental/symbolic_shapes.py:116] lru_cache_stats _update_divisible: CacheInfo(hits=0, misses=0, maxsize=None, currsize=0)
V1113 09:45:55.953000 140250185783104 torch/fx/experimental/symbolic_shapes.py:116] lru_cache_stats replace: CacheInfo(hits=0, misses=0, maxsize=None, currsize=0)
V1113 09:45:55.953000 140250185783104 torch/fx/experimental/symbolic_shapes.py:116] lru_cache_stats _maybe_evaluate_static: CacheInfo(hits=0, misses=0, maxsize=None, currsize=0)
V1113 09:45:55.953000 140250185783104 torch/fx/experimental/symbolic_shapes.py:116] lru_cache_stats get_implications: CacheInfo(hits=0, misses=0, maxsize=None, currsize=0)
V1113 09:45:55.954000 140250185783104 torch/fx/experimental/symbolic_shapes.py:116] lru_cache_stats get_axioms: CacheInfo(hits=0, misses=0, maxsize=None, currsize=0)
V1113 09:45:55.954000 140250185783104 torch/fx/experimental/symbolic_shapes.py:116] lru_cache_stats safe_expand: CacheInfo(hits=0, misses=0, maxsize=256, currsize=0)
V1113 09:45:55.954000 140250185783104 torch/fx/experimental/symbolic_shapes.py:116] lru_cache_stats uninteresting_files: CacheInfo(hits=0, misses=0, maxsize=None, currsize=0)
Steps:   0%|                                                                                                                                                    | 0/8000 [00:01<?, ?it/s]
Traceback (most recent call last):
  File "/root/miniconda3/envs/i2v/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/root/miniconda3/envs/i2v/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 48, in main
    args.func(args)
  File "/root/miniconda3/envs/i2v/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1168, in launch_command
    simple_launcher(args)
  File "/root/miniconda3/envs/i2v/lib/python3.10/site-packages/accelerate/commands/launch.py", line 763, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/root/miniconda3/envs/i2v/bin/python', 'training/cogvideox_image_to_video_lora.py', '--pretrained_model_name_or_path', 'THUDM/CogVideoX-5b-I2V', '--data_root', '/workspace/i2v/cogvideox-factory/fro_tensor', '--caption_column', 'prompts.txt', '--video_column', 'videos.txt', '--id_token', 'BW_STYLE', '--height_buckets', '288', '--width_buckets', '640', '--frame_buckets', '49', '--dataloader_num_workers', '8', '--pin_memory', '--validation_prompt', "BW_STYLE A black and white animated scene unfolds with an anthropomorphic goat surrounded by musical notes and symbols, suggesting a playful environment. Mickey Mouse appears, leaning forward in curiosity as the goat remains still. The goat then engages with Mickey, who bends down to converse or react. The dynamics shift as Mickey grabs the goat, potentially in surprise or playfulness, amidst a minimalistic background. The scene captures the evolving relationship between the two characters in a whimsical, animated setting, emphasizing their interactions and emotions:::BW_STYLE A panda, dressed in a small, red jacket and a tiny hat, sits on a wooden stool in a serene bamboo forest. The panda's fluffy paws strum a miniature acoustic guitar, producing soft, melodic tunes. Nearby, a few other pandas gather, watching curiously and some clapping in rhythm. Sunlight filters through the tall bamboo, casting a gentle glow on the scene. The panda's face is expressive, showing concentration and joy as it plays. The background includes a small, flowing stream and vibrant green foliage, enhancing the peaceful and magical atmosphere of this unique musical performance", '--validation_images', '/path/to/image1.png:::/path/to/image2.png', '--validation_prompt_separator', ':::', '--num_validation_videos', '1', '--validation_epochs', '10', '--seed', '42', '--rank', '128', '--lora_alpha', '128', '--mixed_precision', 'bf16', '--output_dir', '/workspace/cogvideox-factory/fro_i2v_adamw__steps_8000__lr-schedule_cosine_with_restarts__learning-rate_1e-4/', '--max_num_frames', '49', '--train_batch_size', '1', '--max_train_steps', '8000', '--checkpointing_steps', '1000', '--gradient_accumulation_steps', '1', '--gradient_checkpointing', '--learning_rate', '1e-4', '--lr_scheduler', 'cosine_with_restarts', '--lr_warmup_steps', '400', '--lr_num_cycles', '1', '--load_tensors', '--enable_slicing', '--enable_tiling', '--noised_image_dropout', '0.05', '--optimizer', 'adamw', '--beta1', '0.9', '--beta2', '0.95', '--weight_decay', '0.001', '--max_grad_norm', '1.0', '--allow_tf32', '--nccl_timeout', '1800']' returned non-zero exit status 1.
-------------------- Finished executing script --------------------
Expected behavior / 期待表现

问题得到解决
a-r-r-o-w / cogvideox-factory

i2v multiresolution finetuning #88

System Info / 系統信息

Information / 问题信息

Reproduction / 复现过程

Expected behavior / 期待表现