genmoai / mochi

The best OSS video generation models
Apache License 2.0
2.31k stars 233 forks source link

ValueError: num_samples should be a positive integer value, but got num_samples=0 #104

Closed martintomov closed 3 days ago

martintomov commented 3 days ago

Fine-tuning on Modal, everything goes smooth until I start modal run -d main::finetune which leads to this error.

Where can I set the value of num_samples?

Logs:

🍡 Finetuning Mochi. This may take 3 hours.
🍡 See your mochi-tune-finetunes volume for intermediate checkpoints and samples.
Starting training with 1 GPU(s), mode: single_gpu
Using config: lora.yaml
model=/weights/dit.safetensors, optimizer=, start_step_num=0
Found 23 training videos in /videos_prepared
Loaded 0/23 valid file pairs.
Traceback (most recent call last):
  File "/mochi/demos/fine_tuner/train.py", line 396, in <module>
    main()
  File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mochi/demos/fine_tuner/train.py", line 201, in main
    train_dl = torch.utils.data.DataLoader(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/torch/utils/data/dataloader.py", line 376, in __init__
    sampler = RandomSampler(dataset, generator=generator)  # type: ignore[arg-type]
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/torch/utils/data/sampler.py", line 164, in __init__
    raise ValueError(
ValueError: num_samples should be a positive integer value, but got num_samples=0
kill: usage: kill [-s sigspec | -n signum | -sigspec] pid | jobspec ... or kill -l [sigspec]
Curlypla commented 3 days ago

do you have .txt caption ?

martintomov commented 3 days ago

yep. 23 mp4 and txt files with the same naming:

Screenshot 2024-11-30 at 6 48 36 PM
Curlypla commented 3 days ago

The latents are missing, are you sure that the preprocesses work well in the 3rd step?

martintomov commented 3 days ago

hm. looking at the preprocessing, is it expected behaviour that it shortened my videos from 0:07s to 0:01s in length? in the original dataset provided as google drive link, i see they used 0:05s videos.

Curlypla commented 3 days ago

Counting in time is not a good idea, rather in frames, and yes it's normal, if a video is at 30 FPS, the default training at 37 frames will make one-second videos.

martintomov commented 3 days ago

Logs from preprocessing. You're right - something goes wrong here.

🍡 Preprocessing videos. This may take 2-3 minutes.
+ [[ 8 -gt 0 ]]
+ case $1 in
+ check_argument -v /videos/
+ [[ -z /videos/ ]]
+ [[ /videos/ == -* ]]
+ VIDEOS_DIR=/videos/
+ shift
+ shift
+ [[ 6 -gt 0 ]]
+ case $1 in
+ check_argument -o /videos_prepared/
+ [[ -z /videos_prepared/ ]]
+ [[ /videos_prepared/ == -* ]]
+ OUTPUT_DIR=/videos_prepared/
+ shift
+ shift
+ [[ 4 -gt 0 ]]
+ case $1 in
+ check_argument -w /weights/
+ [[ -z /weights/ ]]
+ [[ /weights/ == -* ]]
+ WEIGHTS_DIR=/weights/
+ shift
+ shift
+ [[ 2 -gt 0 ]]
+ case $1 in
+ check_argument -n 37
+ [[ -z 37 ]]
+ [[ 37 == -* ]]
+ NUM_FRAMES=37
+ shift
+ shift
+ [[ 0 -gt 0 ]]
+ [[ -z /videos/ ]]
+ [[ -z /videos_prepared/ ]]
+ [[ -z /weights/ ]]
+ [[ -z 37 ]]
+++ dirname demos/fine_tuner/preprocess.bash
++ cd demos/fine_tuner
++ pwd
+ SCRIPT_DIR=/mochi/demos/fine_tuner
+ echo 'Using script directory: /mochi/demos/fine_tuner'
+ echo -e '\n\e[1;35m🎬 **Step 1: Trim and resize videos** \e[0m'
Using script directory: /mochi/demos/fine_tuner

🎬 **Step 1: Trim and resize videos** 
+++ echo '(37 / 30) + 0.09'
+++ bc -l
++ printf %.1f 1.32333333333333333333
+ DURATION=1.3
+ echo 'Trimming videos to duration: 1.3 seconds'
Trimming videos to duration: 1.3 seconds
+ python3 /mochi/demos/fine_tuner/trim_and_crop_videos.py /videos/ /videos_prepared/ -d 1.3

Processing: /videos/1.mp4
                                      ed/1.txt
Moviepy - Building video /videos_prepared/1.mp4.
Moviepy - Writing video /videos_prepared/1.mp4
  0%|          | 0/23 [00:01<?, ?it/s]                                                            
Moviepy - Done !
Moviepy - video ready /videos_prepared/1.mp4
Processing: /videos/10.mp4
  4%|▍         | 1/23 [00:01<00:29,  1.33s/it]                                              
Moviepy - Building video /videos_prepared/10.mp4.
Moviepy - Writing video /videos_prepared/10.mp4
  4%|▍         | 1/23 [00:02<00:29,  1.33s/it]                                              
Moviepy - Done !
Moviepy - video ready /videos_prepared/10.mp4
Processing: /videos/11.mp4
  9%|▊         | 2/23 [00:02<00:27,  1.30s/it]                                              
Moviepy - Building video /videos_prepared/11.mp4.
Moviepy - Writing video /videos_prepared/11.mp4
  9%|▊         | 2/23 [00:03<00:27,  1.30s/it]                                                            
Moviepy - Done !
Moviepy - video ready /videos_prepared/11.mp4
Processing: /videos/12.mp4
 13%|█▎        | 3/23 [00:04<00:26,  1.32s/it]                                              
Moviepy - Building video /videos_prepared/12.mp4.
Moviepy - Writing video /videos_prepared/12.mp4
 13%|█▎        | 3/23 [00:05<00:26,  1.32s/it]                                              
Moviepy - Done !
Moviepy - video ready /videos_prepared/12.mp4
Processing: /videos/13.mp4

Moviepy - Building video /videos_prepared/13.mp4.
Moviepy - Writing video /videos_prepared/13.mp4
 17%|█▋        | 4/23 [00:06<00:24,  1.28s/it]                                                            
Moviepy - Done !
Moviepy - video ready /videos_prepared/13.mp4
Processing: /videos/14.mp4
 22%|██▏       | 5/23 [00:06<00:23,  1.30s/it]                                              
Moviepy - Building video /videos_prepared/14.mp4.
Moviepy - Writing video /videos_prepared/14.mp4
 22%|██▏       | 5/23 [00:07<00:23,  1.30s/it]                                                            
Moviepy - Done !
Moviepy - video ready /videos_prepared/14.mp4
Processing: /videos/15.mp4
 26%|██▌       | 6/23 [00:07<00:21,  1.26s/it]                                              
Moviepy - Building video /videos_prepared/15.mp4.
Moviepy - Writing video /videos_prepared/15.mp4
 26%|██▌       | 6/23 [00:07<00:21,  1.26s/it]     ⠸ Running (0/1 containers active)... View app at https://modal.com/apps/martintmv-git/main/ap-EUdri6xCbaL 26%|██▌       | 6/23 [00:08<00:21,  1.26s/it]                                                            
Moviepy - Done !
Moviepy - video ready /videos_prepared/15.mp4
Processing: /videos/16.mp4
 30%|███       | 7/23 [00:09<00:20,  1.27s/it]                                              
Moviepy - Building video /videos_prepared/16.mp4.
Moviepy - Writing video /videos_prepared/16.mp4
 30%|███       | 7/23 [00:10<00:20,  1.27s/it]                                                            
Moviepy - Done !
Moviepy - video ready /videos_prepared/16.mp4
Processing: /videos/17.mp4
 35%|███▍      | 8/23 [00:10<00:18,  1.20s/it]                                              
Moviepy - Building video /videos_prepared/17.mp4.
Moviepy - Writing video /videos_prepared/17.mp4
 35%|███▍      | 8/23 [00:11<00:18,  1.20s/it]                                                            
Moviepy - Done !
Moviepy - video ready /videos_prepared/17.mp4
Processing: /videos/18.mp4
 39%|███▉      | 9/23 [00:11<00:16,  1.20s/it]                                              
Moviepy - Building video /videos_prepared/18.mp4.
Moviepy - Writing video /videos_prepared/18.mp4
 39%|███▉      | 9/23 [00:12<00:16,  1.20s/it]                                                            
Moviepy - Done !
Moviepy - video ready /videos_prepared/18.mp4
Processing: /videos/19.mp4
 43%|████▎     | 10/23 [00:12<00:15,  1.21s/it]                                               
Moviepy - Building video /videos_prepared/19.mp4.
Moviepy - Writing video /videos_prepared/19.mp4
 43%|████▎     | 10/23 [00:13<00:15,  1.21s/it]                                                            
Moviepy - Done !
Moviepy - video ready /videos_prepared/19.mp4
 48%|████▊     | 11/23 [00:13<00:14,  1.24s/it]                                               
 48%|████▊     | 11/23 [00:13<00:14,  1.24s/it]                                               
Moviepy - Building video /videos_prepared/2.mp4.
Moviepy - Writing video /videos_prepared/2.mp4
 48%|████▊     | 11/23 [00:14<00:14,  1.24s/it]                                                            
Moviepy - Done !
Moviepy - video ready /videos_prepared/2.mp4
Processing: /videos/20.mp4
 52%|█████▏    | 12/23 [00:15<00:13,  1.22s/it]                                               
Moviepy - Building video /videos_prepared/20.mp4.
Moviepy - Writing video /videos_prepared/20.mp4
 52%|█████▏    | 12/23 [00:16<00:13,  1.22s/it]                                                            
Moviepy - Done !
Moviepy - video ready /videos_prepared/20.mp4
 57%|█████▋    | 13/23 [00:16<00:12,  1.23s/it]⠏ Running (0/1 containers active)... View app at https://modal.com/apps/martintmv-git/main/ap-EUdri6xCbaLst1kCopied /videos/21.txt to /videos_prepared/21.txt
 57%|█████▋    | 13/23 [00:16<00:12,  1.23s/it]                                               
Moviepy - Building video /videos_prepared/21.mp4.
Moviepy - Writing video /videos_prepared/21.mp4
 57%|█████▋    | 13/23 [00:17<00:12,  1.23s/it]                                                            
Moviepy - Done !
Moviepy - video ready /videos_prepared/21.mp4
Processing: /videos/22.mp4
 61%|██████    | 14/23 [00:17<00:10,  1.22s/it]                                               
Moviepy - Building video /videos_prepared/22.mp4.
Moviepy - Writing video /videos_prepared/22.mp4
 61%|██████    | 14/23 [00:18<00:10,  1.22s/it]                                                            
Moviepy - Done !
Moviepy - video ready /videos_prepared/22.mp4
Processing: /videos/23.mp4
 65%|██████▌   | 15/23 [00:18<00:09,  1.23s/it]                                               
Moviepy - Building video /videos_prepared/23.mp4.
Moviepy - Writing video /videos_prepared/23.mp4
 65%|██████▌   | 15/23 [00:19<00:09,  1.23s/it]                                                            
Moviepy - Done !
Moviepy - video ready /videos_prepared/23.mp4
Processing: /videos/3.mp4
 70%|██████▉   | 16/23 [00:19<00:08,  1.23s/it]                                                70%|██████▉   | 16/23 [00:20<00:08,  1.23s/it]           
Moviepy - Building video /videos_prepared/3.mp4.
Moviepy - Writing video /videos_prepared/3.mp4
 70%|██████▉   | 16/23 [00:21<00:08,  1.23s/it]                                                            
Moviepy - Done !
Moviepy - video ready /videos_prepared/3.mp4
Processing: /videos/4.mp4
 74%|███████▍  | 17/23 [00:21<00:07,  1.25s/it]                                               
Moviepy - Building video /videos_prepared/4.mp4.
Moviepy - Writing video /videos_prepared/4.mp4
 74%|███████▍  | 17/23 [00:22<00:07,  1.25s/it]                                                            
Moviepy - Done !
Moviepy - video ready /videos_prepared/4.mp4
 78%|███████▊  | 18/23 [00:22<00:06,  1.25s/it]                                               
 78%|███████▊  | 18/23 [00:22<00:06,  1.25s/it]                                               
Moviepy - Building video /videos_prepared/5.mp4.
Moviepy - Writing video /videos_prepared/5.mp4
 78%|███████▊  | 18/23 [00:23<00:06,  1.25s/it]                                                            
Moviepy - Done !
Moviepy - video ready /videos_prepared/5.mp4
Processing: /videos/6.mp4
 83%|████████▎ | 19/23 [00:24<00:05,  1.28s/it]                                               
Moviepy - Building video /videos_prepared/6.mp4.
Moviepy - Writing video /videos_prepared/6.mp4
 83%|████████▎ | 19/23 [00:25<00:05,  1.28s/it]                                                            
Moviepy - Done !
Moviepy - video ready /videos_prepared/6.mp4
Processing: /videos/7.mp4
 87%|████████▋ | 20/23 [00:25<00:03,  1.29s/it]                                               
Moviepy - Building video /videos_prepared/7.mp4.
Moviepy - Writing video /videos_prepared/7.mp4
 87%|████████▋ | 20/23 [00:26<00:03,  1.29s/it]                                                            
Moviepy - Done !
Moviepy - video ready /videos_prepared/7.mp4
 91%|█████████▏| 21/23 [00:26<00:02,  1.28s/it]                                               
 91%|█████████▏| 21/23 [00:26<00:02,  1.28s/it]                                               
Moviepy - Building video /videos_prepared/8.mp4.
Moviepy - Writing video /videos_prepared/8.mp4
 91%|█████████▏| 21/23 [00:27<00:02,  1.28s/it]                                                            
Moviepy - Done !
Moviepy - video ready /videos_prepared/8.mp4
Processing: /videos/9.mp4
 96%|█████████▌| 22/23 [00:27<00:01,  1.25s/it]                                               
Moviepy - Building video /videos_prepared/9.mp4.
Moviepy - Writing video /videos_prepared/9.mp4
100%|██████████| 23/23 [00:28<00:00,  1.27s/it]100%|██████████| 23/23 [00:28<00:00,  1.25s/it]             
Moviepy - Done !
Moviepy - video ready /videos_prepared/9.mp4
+ echo -e '\n\e[1;35m🎥 **Step 2: Run the VAE encoder on each video** \e[0m'

🎥 **Step 2: Run the VAE encoder on each video** 
+ python3 /mochi/demos/fine_tuner/encode_videos.py /videos_prepared/ --model_dir /weights/ --num_gpus 1 --shape 37x480x848 --overwrite
Timing load_encoder
Timing load_decoder
Stage                   Time(s)    Percent
load_encoder               1.61     35.77%
load_decoder               2.90     64.23%
Traceback (most recent call last)::00<00:04, 5.41it/s]                                                        
  File "/mochi/demos/fine_tuner/encode_videos.py", line 132, in batch_process                               
    preprocess(
  File "/mochi/demos/fine_tuner/encode_videos.py", line 65, in preprocess                                   
    ldist.mean = cp_conv.gather_all_frames(ldist.mean)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mochi/src/genmo/mochi_preview/vae/cp_conv.py", line 67, in gather_all_frames                       
    cp_group = cp.get_cp_group()
               ^^^^^^^^^^^^^^^^^
  File "/mochi/src/genmo/mochi_preview/dit/joint_model/context_parallel.py", line 48, in get_cp_group       
    raise RuntimeError("CP group not initialized")
RuntimeError: CP group not initialized                
Traceback (most recent call last)::02<00:26, 1.25s/it]                                                      
  File "/mochi/demos/fine_tuner/encode_videos.py", line 132, in batch_process                               
    preprocess(
  File "/mochi/demos/fine_tuner/encode_videos.py", line 65, in preprocess
    ldist.mean = cp_conv.gather_all_frames(ldist.mean)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mochi/src/genmo/mochi_preview/vae/cp_conv.py", line 67, in gather_all_frames                       
    cp_group = cp.get_cp_group()
               ^^^^^^^^^^^^^^^^^
  File "/mochi/src/genmo/mochi_preview/dit/joint_model/context_parallel.py", line 48, in get_cp_group       
    raise RuntimeError("CP group not initialized")
RuntimeError: CP group not initialized00:26, 1.25s/it]                                                      
Traceback (most recent call last)::03<00:23, 1.16s/it]                                                      
  File "/mochi/demos/fine_tuner/encode_videos.py", line 132, in batch_process                               
    preprocess(
  File "/mochi/demos/fine_tuner/encode_videos.py", line 65, in preprocess                                   
    ldist.mean = cp_conv.gather_all_frames(ldist.mean)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mochi/src/genmo/mochi_preview/vae/cp_conv.py", line 67, in gather_all_frames
    cp_group = cp.get_cp_group()
               ^^^^^^^^^^^^^^^^^
  File "/mochi/src/genmo/mochi_preview/dit/joint_model/context_parallel.py", line 48, in get_cp_group       
    raise RuntimeError("CP group not initialized")
RuntimeError: CP group not initialized00:23, 1.16s/it]                                                      
Traceback (most recent call last)::04<00:21, 1.11s/it]                                                      
  File "/mochi/demos/fine_tuner/encode_videos.py", line 132, in batch_process
    preprocess(
  File "/mochi/demos/fine_tuner/encode_videos.py", line 65, in preprocess
    ldist.mean = cp_conv.gather_all_frames(ldist.mean)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mochi/src/genmo/mochi_preview/vae/cp_conv.py", line 67, in gather_all_frames
    cp_group = cp.get_cp_group()
               ^^^^^^^^^^^^^^^^^
  File "/mochi/src/genmo/mochi_preview/dit/joint_model/context_parallel.py", line 48, in get_cp_group       
    raise RuntimeError("CP group not initialized")
RuntimeError: CP group not initialized                
Traceback (most recent call last)::05<00:19, 1.07s/it]                                                      
  File "/mochi/demos/fine_tuner/encode_videos.py", line 132, in batch_process                               
    preprocess(
  File "/mochi/demos/fine_tuner/encode_videos.py", line 65, in preprocess
    ldist.mean = cp_conv.gather_all_frames(ldist.mean)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mochi/src/genmo/mochi_preview/vae/cp_conv.py", line 67, in gather_all_frames
    cp_group = cp.get_cp_group()
               ^^^^^^^^^^^^^^^^^
  File "/mochi/src/genmo/mochi_preview/dit/joint_model/context_parallel.py", line 48, in get_cp_group       
    raise RuntimeError("CP group not initialized")
RuntimeError: CP group not initialized                
Traceback (most recent call last)::06<00:17, 1.05s/it]                                                      
  File "/mochi/demos/fine_tuner/encode_videos.py", line 132, in batch_process                               
    preprocess(
  File "/mochi/demos/fine_tuner/encode_videos.py", line 65, in preprocess
    ldist.mean = cp_conv.gather_all_frames(ldist.mean)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mochi/src/genmo/mochi_preview/vae/cp_conv.py", line 67, in gather_all_frames                       
    cp_group = cp.get_cp_group()
               ^^^^^^^^^^^^^^^^^
  File "/mochi/src/genmo/mochi_preview/dit/joint_model/context_parallel.py", line 48, in get_cp_group
    raise RuntimeError("CP group not initialized")
RuntimeError: CP group not initialized00:17, 1.05s/it]                                                      
Traceback (most recent call last)::07<00:16, 1.04s/it]                                                      
  File "/mochi/demos/fine_tuner/encode_videos.py", line 132, in batch_process                               
    preprocess(
  File "/mochi/demos/fine_tuner/encode_videos.py", line 65, in preprocess                                   
    ldist.mean = cp_conv.gather_all_frames(ldist.mean)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mochi/src/genmo/mochi_preview/vae/cp_conv.py", line 67, in gather_all_frames
    cp_group = cp.get_cp_group()
               ^^^^^^^^^^^^^^^^^
  File "/mochi/src/genmo/mochi_preview/dit/joint_model/context_parallel.py", line 48, in get_cp_group       
    raise RuntimeError("CP group not initialized")
RuntimeError: CP group not initialized                
Traceback (most recent call last)::08<00:15, 1.04s/it]                                                      
  File "/mochi/demos/fine_tuner/encode_videos.py", line 132, in batch_process
    preprocess(
  File "/mochi/demos/fine_tuner/encode_videos.py", line 65, in preprocess
    ldist.mean = cp_conv.gather_all_frames(ldist.mean)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mochi/src/genmo/mochi_preview/vae/cp_conv.py", line 67, in gather_all_frames
    cp_group = cp.get_cp_group()
               ^^^^^^^^^^^^^^^^^
                                                        File "/mochi/src/genmo/mochi_preview/dit/joint_model/context_parallel.py", line 48, in get_cp_group
    raise RuntimeError("CP group not initialized")
RuntimeError: CP group not initialized                
Traceback (most recent call last)::09<00:15, 1.04s/it]                                                      
  File "/mochi/demos/fine_tuner/encode_videos.py", line 132, in batch_process                               
    preprocess(
  File "/mochi/demos/fine_tuner/encode_videos.py", line 65, in preprocess                                   
    ldist.mean = cp_conv.gather_all_frames(ldist.mean)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mochi/src/genmo/mochi_preview/vae/cp_conv.py", line 67, in gather_all_frames                       
    cp_group = cp.get_cp_group()
               ^^^^^^^^^^^^^^^^^
  File "/mochi/src/genmo/mochi_preview/dit/joint_model/context_parallel.py", line 48, in get_cp_group       
    raise RuntimeError("CP group not initialized")
RuntimeError: CP group not initialized00:15, 1.04s/it]                                                      
Traceback (most recent call last):                                                                          
  File "/mochi/demos/fine_tuner/encode_videos.py", line 132, in batch_process                               
    preprocess(
  File "/mochi/demos/fine_tuner/encode_videos.py", line 65, in preprocess
    ldist.mean = cp_conv.gather_all_frames(ldist.mean)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mochi/src/genmo/mochi_preview/vae/cp_conv.py", line 67, in gather_all_frames
    cp_group = cp.get_cp_group()
               ^^^^^^^^^^^^^^^^^
  File "/mochi/src/genmo/mochi_preview/dit/joint_model/context_parallel.py", line 48, in get_cp_group       
    raise RuntimeError("CP group not initialized")
RuntimeError: CP group not initialized00:13, 1.03s/it]                                                      
Traceback (most recent call last)::11<00:12, 1.03s/it]                                                      
  File "/mochi/demos/fine_tuner/encode_videos.py", line 132, in batch_process                               
    preprocess(
  File "/mochi/demos/fine_tuner/encode_videos.py", line 65, in preprocess
    ldist.mean = cp_conv.gather_all_frames(ldist.mean)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mochi/src/genmo/mochi_preview/vae/cp_conv.py", line 67, in gather_all_frames
    cp_group = cp.get_cp_group()
               ^^^^^^^^^^^^^^^^^
  File "/mochi/src/genmo/mochi_preview/dit/joint_model/context_parallel.py", line 48, in get_cp_group
    raise RuntimeError("CP group not initialized")
RuntimeError: CP group not initialized                
Traceback (most recent call last)::12<00:12, 1.03s/it]                                                      
  File "/mochi/demos/fine_tuner/encode_videos.py", line 132, in batch_process                               
    preprocess(
                                                        File "/mochi/demos/fine_tuner/encode_videos.py", line 65, in preprocess
    ldist.mean = cp_conv.gather_all_frames(ldist.mean)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                                                        File "/mochi/src/genmo/mochi_preview/vae/cp_conv.py", line 67, in gather_all_frames
    cp_group = cp.get_cp_group()
               ^^^^^^^^^^^^^^^^^
  File "/mochi/src/genmo/mochi_preview/dit/joint_model/context_parallel.py", line 48, in get_cp_group
    raise RuntimeError("CP group not initialized")
RuntimeError: CP group not initialized00:12, 1.03s/it]                                                      
Traceback (most recent call last)::13<00:10, 1.02s/it]                                                      
  File "/mochi/demos/fine_tuner/encode_videos.py", line 132, in batch_process                               
    preprocess(
 0:  57%|█████▋    | 13.0/23.0 [00:14<00:10, 1.02s/it]                                                        File "/mochi/demos/fine_tuner/encode_videos.py", line 65, in preprocess
    ldist.mean = cp_conv.gather_all_frames(ldist.mean)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mochi/src/genmo/mochi_preview/vae/cp_conv.py", line 67, in gather_all_frames                       
    cp_group = cp.get_cp_group()
               ^^^^^^^^^^^^^^^^^
  File "/mochi/src/genmo/mochi_preview/dit/joint_model/context_parallel.py", line 48, in get_cp_group       
    raise RuntimeError("CP group not initialized")
RuntimeError: CP group not initialized                
Traceback (most recent call last)::14<00:10, 1.02s/it]                                                      
  File "/mochi/demos/fine_tuner/encode_videos.py", line 132, in batch_process                               
    preprocess(
  File "/mochi/demos/fine_tuner/encode_videos.py", line 65, in preprocess                                   
    ldist.mean = cp_conv.gather_all_frames(ldist.mean)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mochi/src/genmo/mochi_preview/vae/cp_conv.py", line 67, in gather_all_frames
    cp_group = cp.get_cp_group()
               ^^^^^^^^^^^^^^^^^
  File "/mochi/src/genmo/mochi_preview/dit/joint_model/context_parallel.py", line 48, in get_cp_group       
    raise RuntimeError("CP group not initialized")
RuntimeError: CP group not initialized00:10, 1.02s/it]                                                      
Traceback (most recent call last):
  File "/mochi/demos/fine_tuner/encode_videos.py", line 132, in batch_process                               
    preprocess(
  File "/mochi/demos/fine_tuner/encode_videos.py", line 65, in preprocess
    ldist.mean = cp_conv.gather_all_frames(ldist.mean)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mochi/src/genmo/mochi_preview/vae/cp_conv.py", line 67, in gather_all_frames
    cp_group = cp.get_cp_group()
               ^^^^^^^^^^^^^^^^^
  File "/mochi/src/genmo/mochi_preview/dit/joint_model/context_parallel.py", line 48, in get_cp_group
    raise RuntimeError("CP group not initialized")
RuntimeError: CP group not initialized                
Traceback (most recent call last)::16<00:08, 1.01s/it]                                                      
  File "/mochi/demos/fine_tuner/encode_videos.py", line 132, in batch_process                               
    preprocess(
  File "/mochi/demos/fine_tuner/encode_videos.py", line 65, in preprocess                                   
    ldist.mean = cp_conv.gather_all_frames(ldist.mean)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mochi/src/genmo/mochi_preview/vae/cp_conv.py", line 67, in gather_all_frames
    cp_group = cp.get_cp_group()
               ^^^^^^^^^^^^^^^^^
  File "/mochi/src/genmo/mochi_preview/dit/joint_model/context_parallel.py", line 48, in get_cp_group
    raise RuntimeError("CP group not initialized")
RuntimeError: CP group not initialized00:08, 1.01s/it]                                                      
Traceback (most recent call last)::17<00:06, 1.01s/it]                                                      
  File "/mochi/demos/fine_tuner/encode_videos.py", line 132, in batch_process                               
    preprocess(
                                                        File "/mochi/demos/fine_tuner/encode_videos.py", line 65, in preprocess
    ldist.mean = cp_conv.gather_all_frames(ldist.mean)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                                                        File "/mochi/src/genmo/mochi_preview/vae/cp_conv.py", line 67, in gather_all_frames
    cp_group = cp.get_cp_group()
               ^^^^^^^^^^^^^^^^^
  File "/mochi/src/genmo/mochi_preview/dit/joint_model/context_parallel.py", line 48, in get_cp_group       
    raise RuntimeError("CP group not initialized")
RuntimeError: CP group not initialized                
 0:  78%|███████▊  | 18.0/23.0 [00:18<00:05, 1.01s/it]                                                      Traceback (most recent call last):
 0:  78%|███████▊  | 18.0/23.0 [00:19<00:05, 1.01s/it]                                                        File "/mochi/demos/fine_tuner/encode_videos.py", line 132, in batch_process
    preprocess(
  File "/mochi/demos/fine_tuner/encode_videos.py", line 65, in preprocess                                   
    ldist.mean = cp_conv.gather_all_frames(ldist.mean)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mochi/src/genmo/mochi_preview/vae/cp_conv.py", line 67, in gather_all_frames
    cp_group = cp.get_cp_group()
               ^^^^^^^^^^^^^^^^^
  File "/mochi/src/genmo/mochi_preview/dit/joint_model/context_parallel.py", line 48, in get_cp_group       
    raise RuntimeError("CP group not initialized")
RuntimeError: CP group not initialized                
Traceback (most recent call last)::19<00:05, 1.01s/it]                                                      
  File "/mochi/demos/fine_tuner/encode_videos.py", line 132, in batch_process                               
    preprocess(
 0:  78%|███████▊  | 18.0/23.0 [00:20<00:05, 1.01s/it]                                                        File "/mochi/demos/fine_tuner/encode_videos.py", line 65, in preprocess
    ldist.mean = cp_conv.gather_all_frames(ldist.mean)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mochi/src/genmo/mochi_preview/vae/cp_conv.py", line 67, in gather_all_frames
    cp_group = cp.get_cp_group()
               ^^^^^^^^^^^^^^^^^
  File "/mochi/src/genmo/mochi_preview/dit/joint_model/context_parallel.py", line 48, in get_cp_group
    raise RuntimeError("CP group not initialized")
RuntimeError: CP group not initialized                
 0:  87%|████████▋ | 20.0/23.0 [00:20<00:03, 1.02s/it]                                                      Traceback (most recent call last):
  File "/mochi/demos/fine_tuner/encode_videos.py", line 132, in batch_process                               
    preprocess(
  File "/mochi/demos/fine_tuner/encode_videos.py", line 65, in preprocess
    ldist.mean = cp_conv.gather_all_frames(ldist.mean)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mochi/src/genmo/mochi_preview/vae/cp_conv.py", line 67, in gather_all_frames                       
    cp_group = cp.get_cp_group()
               ^^^^^^^^^^^^^^^^^
  File "/mochi/src/genmo/mochi_preview/dit/joint_model/context_parallel.py", line 48, in get_cp_group       
    raise RuntimeError("CP group not initialized")
RuntimeError: CP group not initialized                
Traceback (most recent call last)::21<00:02, 1.02s/it]                                                      
  File "/mochi/demos/fine_tuner/encode_videos.py", line 132, in batch_process                               
    preprocess(
  File "/mochi/demos/fine_tuner/encode_videos.py", line 65, in preprocess
    ldist.mean = cp_conv.gather_all_frames(ldist.mean)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mochi/src/genmo/mochi_preview/vae/cp_conv.py", line 67, in gather_all_frames
    cp_group = cp.get_cp_group()
               ^^^^^^^^^^^^^^^^^
  File "/mochi/src/genmo/mochi_preview/dit/joint_model/context_parallel.py", line 48, in get_cp_group
    raise RuntimeError("CP group not initialized")
RuntimeError: CP group not initialized                
Traceback (most recent call last)::22<00:02, 1.02s/it]                                                      
  File "/mochi/demos/fine_tuner/encode_videos.py", line 132, in batch_process                               
    preprocess(
  File "/mochi/demos/fine_tuner/encode_videos.py", line 65, in preprocess                                   
    ldist.mean = cp_conv.gather_all_frames(ldist.mean)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mochi/src/genmo/mochi_preview/vae/cp_conv.py", line 67, in gather_all_frames                       
    cp_group = cp.get_cp_group()
               ^^^^^^^^^^^^^^^^^
  File "/mochi/src/genmo/mochi_preview/dit/joint_model/context_parallel.py", line 48, in get_cp_group       
    raise RuntimeError("CP group not initialized")
RuntimeError: CP group not initialized                
Traceback (most recent call last)::23<00:00, 1.01s/it]                                                      
 0: 100%|██████████| 23.0/23.0 [00:24<00:00, 1.01s/it]                                                        File "/mochi/demos/fine_tuner/encode_videos.py", line 132, in batch_process
    preprocess(
  File "/mochi/demos/fine_tuner/encode_videos.py", line 65, in preprocess
    ldist.mean = cp_conv.gather_all_frames(ldist.mean)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mochi/src/genmo/mochi_preview/vae/cp_conv.py", line 67, in gather_all_frames
    cp_group = cp.get_cp_group()
               ^^^^^^^^^^^^^^^^^
  File "/mochi/src/genmo/mochi_preview/dit/joint_model/context_parallel.py", line 48, in get_cp_group       
    raise RuntimeError("CP group not initialized")
RuntimeError: CP group not initialized                

Processing /videos_prepared/1.mp4
Trimmed video from 39 to first 37 frames
Error processing /videos_prepared/1.mp4: CP group not initialized
Processing /videos_prepared/10.mp4
Trimmed video from 39 to first 37 frames
Error processing /videos_prepared/10.mp4: CP group not initialized
Processing /videos_prepared/11.mp4
Trimmed video from 39 to first 37 frames
Error processing /videos_prepared/11.mp4: CP group not initialized
Processing /videos_prepared/12.mp4
Trimmed video from 39 to first 37 frames
Error processing /videos_prepared/12.mp4: CP group not initialized
Processing /videos_prepared/13.mp4
Trimmed video from 39 to first 37 frames
Error processing /videos_prepared/13.mp4: CP group not initialized
Processing /videos_prepared/14.mp4
Trimmed video from 39 to first 37 frames
Error processing /videos_prepared/14.mp4: CP group not initialized
Processing /videos_prepared/15.mp4
Trimmed video from 39 to first 37 frames
Error processing /videos_prepared/15.mp4: CP group not initialized
Processing /videos_prepared/16.mp4
Trimmed video from 39 to first 37 frames
Error processing /videos_prepared/16.mp4: CP group not initialized
Processing /videos_prepared/17.mp4
Trimmed video from 39 to first 37 frames
Error processing /videos_prepared/17.mp4: CP group not initialized
Processing /videos_prepared/18.mp4
Trimmed video from 39 to first 37 frames
Error processing /videos_prepared/18.mp4: CP group not initialized
Processing /videos_prepared/19.mp4
Trimmed video from 39 to first 37 frames
Error processing /videos_prepared/19.mp4: CP group not initialized
Processing /videos_prepared/2.mp4
Trimmed video from 39 to first 37 frames
Error processing /videos_prepared/2.mp4: CP group not initialized
Processing /videos_prepared/20.mp4
Trimmed video from 39 to first 37 frames
Error processing /videos_prepared/20.mp4: CP group not initialized
Processing /videos_prepared/21.mp4
Trimmed video from 39 to first 37 frames
Error processing /videos_prepared/21.mp4: CP group not initialized
Processing /videos_prepared/22.mp4
Trimmed video from 39 to first 37 frames
Error processing /videos_prepared/22.mp4: CP group not initialized
Processing /videos_prepared/23.mp4
Trimmed video from 39 to first 37 frames
Error processing /videos_prepared/23.mp4: CP group not initialized
Processing /videos_prepared/3.mp4
Trimmed video from 39 to first 37 frames
Error processing /videos_prepared/3.mp4: CP group not initialized
Processing /videos_prepared/4.mp4
Trimmed video from 39 to first 37 frames
Error processing /videos_prepared/4.mp4: CP group not initialized
Processing /videos_prepared/5.mp4
Trimmed video from 39 to first 37 frames
Error processing /videos_prepared/5.mp4: CP group not initialized
Processing /videos_prepared/6.mp4
Trimmed video from 39 to first 37 frames
Error processing /videos_prepared/6.mp4: CP group not initialized
Processing /videos_prepared/7.mp4
Trimmed video from 39 to first 37 frames
Error processing /videos_prepared/7.mp4: CP group not initialized
Processing /videos_prepared/8.mp4
Trimmed video from 39 to first 37 frames
Error processing /videos_prepared/8.mp4: CP group not initialized
Processing /videos_prepared/9.mp4
Trimmed video from 39 to first 37 frames
 0: 100%|██████████| 23.0/23.0 [00:25<00:00, 1.09s/it]initialized
+ echo -e '\n\e[1;35m🧠 **Step 3: Compute T5 embeddings** \e[0m'

🧠 **Step 3: Compute T5 embeddings** 
+ python3 /mochi/demos/fine_tuner/embed_captions.py --overwrite /videos_prepared/
 96%|█████████▌| 22/23 [00:01<00:00, 26.25it/s]100%|██████████| 23/23 [00:01<00:00, 16.44it/s]
Processing /videos_prepared/1.txt
Processing /videos_prepared/10.txt
Processing /videos_prepared/11.txt
Processing /videos_prepared/12.txt
Processing /videos_prepared/13.txt
Processing /videos_prepared/14.txt
Processing /videos_prepared/15.txt
Processing /videos_prepared/16.txt
Processing /videos_prepared/17.txt
Processing /videos_prepared/18.txt
Processing /videos_prepared/19.txt
Processing /videos_prepared/2.txt
Processing /videos_prepared/20.txt
Processing /videos_prepared/21.txt
Processing /videos_prepared/22.txt
Processing /videos_prepared/23.txt
Processing /videos_prepared/3.txt
Processing /videos_prepared/4.txt
Processing /videos_prepared/5.txt
Processing /videos_prepared/6.txt
Processing /videos_prepared/7.txt
Processing /videos_prepared/8.txt
Processing /videos_prepared/9.txt
+ echo -e '\n\e[1;32m✓ Done!\e[0m'

✓ Done!
Curlypla commented 3 days ago

This is the problem I mentioned in https://github.com/genmoai/mochi/issues/98, it has been fixed, you need to force the rebuild in main.py

add .run_commands(CLONE_CMD, force_build=True) and update your lora yaml file to the latest or you will have error

martintomov commented 3 days ago

Thank you!