ArrowLuo / CLIP4Clip

An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"
https://arxiv.org/abs/2104.08860
MIT License
851 stars 121 forks source link

subprocess.CalledProcessError #71

Closed alvinlin1271320 closed 2 years ago

alvinlin1271320 commented 2 years ago

Hi, I wanna train CLIP4Clip on MSRVTT. But I got this issue. Can you Help me?

subprocess.CalledProcessError: Command '['/home/nccu/anaconda3/envs/Clip4Video-alvin/bin/python', '-u', 'main_task_retrieval.py', '--local_rank=3', '--do_train', '--num_thread_reader=0', '--epochs=5', '--batch_size=128', '--n_display=50', '--train_csv', './MSRVTT/videos/MSRVTT_train.9k.csv', '--val_csv', './MSRVTT/videos/MSRVTT_JSFUSION_test.csv', '--data_path', './MSRVTT/videos/MSRVTT_data.json', '--features_path', './MSRVTT/videos/MSRVTT_Videos', '--output_dir', 'ckpts/ckpt_msrvtt_retrieval_looseType', '--lr', '1e-4', '--max_words', '32', '--max_frames', '12', '--batch_size_val', '16', '--datatype', 'msrvtt', '--expand_msrvtt_sentences', '--feature_framerate', '1', '--coef_lr', '1e-3', '--freeze_layer_num', '0', '--slice_framepos', '2', '--loose_type', '--linear_patch', '2d', '--sim_header', 'meanP', '--pretrained_clip_name', 'ViT-B/32']' returned non-zero exit status 1.

I will appreciate your help with this situation. Thank you in advance.

ArrowLuo commented 2 years ago

Hi @ALVIN-SMITH, the log is incomplete, and I can not judge the problem base on it. Can you print all the information?

alvinlin1271320 commented 2 years ago

Of course!

python -m torch.distributed.launch --nproc_per_node=1 main_task_retrieval.py --do_train --num_thread_reader=0 --epochs=5 --batch_size=128 --n_display=50 --train_csv ${DATA_PATH}/MSRVTT_train.9k.csv --val_csv ${DATA_PATH}/MSRVTT_JSFUSION_test.csv --data_path ${DATA_PATH}/MSRVTT_data.json --features_path ${DATA_PATH}/MSRVTT_Videos --output_dir ckpts/ckpt_msrvtt_retrieval_looseType --lr 1e-4 --max_words 32 --max_frames 12 --batch_size_val 16 --datatype msrvtt --expand_msrvtt_sentences --feature_framerate 1 --coef_lr 1e-3 --freeze_layer_num 0 --slice_framepos 2 --loose_type --linear_patch 2d --sim_header meanP --pretrained_clip_name ViT-B/32

05/19/2022 19:32:03 - INFO - Effective parameters: 05/19/2022 19:32:03 - INFO - <<< batch_size: 128 05/19/2022 19:32:03 - INFO - <<< batch_size_val: 16 05/19/2022 19:32:03 - INFO - <<< cache_dir: 05/19/2022 19:32:03 - INFO - <<< coef_lr: 0.001 05/19/2022 19:32:03 - INFO - <<< cross_model: cross-base 05/19/2022 19:32:03 - INFO - <<< cross_num_hidden_layers: 4 05/19/2022 19:32:03 - INFO - <<< data_path: ./MSRVTT/videos/MSRVTT_data.json 05/19/2022 19:32:03 - INFO - <<< datatype: msrvtt 05/19/2022 19:32:03 - INFO - <<< do_eval: False 05/19/2022 19:32:03 - INFO - <<< do_lower_case: False 05/19/2022 19:32:03 - INFO - <<< do_pretrain: False 05/19/2022 19:32:03 - INFO - <<< do_train: True 05/19/2022 19:32:03 - INFO - <<< epochs: 5 05/19/2022 19:32:03 - INFO - <<< eval_frame_order: 0 05/19/2022 19:32:03 - INFO - <<< expand_msrvtt_sentences: True 05/19/2022 19:32:03 - INFO - <<< feature_framerate: 1 05/19/2022 19:32:03 - INFO - <<< features_path: ./MSRVTT/videos/MSRVTT_Videos 05/19/2022 19:32:03 - INFO - <<< fp16: False 05/19/2022 19:32:03 - INFO - <<< fp16_opt_level: O1 05/19/2022 19:32:03 - INFO - <<< freeze_layer_num: 0 05/19/2022 19:32:03 - INFO - <<< gradient_accumulation_steps: 1 05/19/2022 19:32:03 - INFO - <<< hard_negative_rate: 0.5 05/19/2022 19:32:03 - INFO - <<< init_model: None 05/19/2022 19:32:03 - INFO - <<< linear_patch: 2d 05/19/2022 19:32:03 - INFO - <<< local_rank: 0 05/19/2022 19:32:03 - INFO - <<< loose_type: True 05/19/2022 19:32:03 - INFO - <<< lr: 0.0001 05/19/2022 19:32:03 - INFO - <<< lr_decay: 0.9 05/19/2022 19:32:03 - INFO - <<< margin: 0.1 05/19/2022 19:32:03 - INFO - <<< max_frames: 12 05/19/2022 19:32:03 - INFO - <<< max_words: 32 05/19/2022 19:32:03 - INFO - <<< n_display: 50 05/19/2022 19:32:03 - INFO - <<< n_gpu: 1 05/19/2022 19:32:03 - INFO - <<< n_pair: 1 05/19/2022 19:32:03 - INFO - <<< negative_weighting: 1 05/19/2022 19:32:03 - INFO - <<< num_thread_reader: 0 05/19/2022 19:32:03 - INFO - <<< output_dir: ckpts/ckpt_msrvtt_retrieval_looseType 05/19/2022 19:32:03 - INFO - <<< pretrained_clip_name: ViT-B/32 05/19/2022 19:32:03 - INFO - <<< rank: 0 05/19/2022 19:32:03 - INFO - <<< resume_model: None 05/19/2022 19:32:03 - INFO - <<< sampled_use_mil: False 05/19/2022 19:32:03 - INFO - <<< seed: 42 05/19/2022 19:32:03 - INFO - <<< sim_header: meanP 05/19/2022 19:32:03 - INFO - <<< slice_framepos: 2 05/19/2022 19:32:03 - INFO - <<< task_type: retrieval 05/19/2022 19:32:03 - INFO - <<< text_num_hidden_layers: 12 05/19/2022 19:32:03 - INFO - <<< train_csv: ./MSRVTT/videos/MSRVTT_train.9k.csv 05/19/2022 19:32:03 - INFO - <<< train_frame_order: 0 05/19/2022 19:32:03 - INFO - <<< use_mil: False 05/19/2022 19:32:03 - INFO - <<< val_csv: ./MSRVTT/videos/MSRVTT_JSFUSION_test.csv 05/19/2022 19:32:03 - INFO - <<< video_dim: 1024 05/19/2022 19:32:03 - INFO - <<< visual_num_hidden_layers: 12 05/19/2022 19:32:03 - INFO - <<< warmup_proportion: 0.1 05/19/2022 19:32:03 - INFO - <<< world_size: 1 05/19/2022 19:32:03 - INFO - device: cuda:0 n_gpu: 2 05/19/2022 19:32:03 - INFO - loading archive file /home/nccu/Alvin Lin/CLIP4Clip/modules/cross-base 05/19/2022 19:32:03 - INFO - Model config { "attention_probs_dropout_prob": 0.1, "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 512, "initializer_range": 0.02, "intermediate_size": 2048, "max_position_embeddings": 128, "num_attention_heads": 8, "num_hidden_layers": 4, "type_vocab_size": 2, "vocab_size": 512 }

05/19/2022 19:32:04 - WARNING - Stage-One:True, Stage-Two:False 05/19/2022 19:32:04 - WARNING - Test retrieval by loose type. 05/19/2022 19:32:04 - WARNING - embed_dim: 512 05/19/2022 19:32:04 - WARNING - image_resolution: 224 05/19/2022 19:32:04 - WARNING - vision_layers: 12 05/19/2022 19:32:04 - WARNING - vision_width: 768 05/19/2022 19:32:04 - WARNING - vision_patch_size: 32 05/19/2022 19:32:04 - WARNING - context_length: 77 05/19/2022 19:32:04 - WARNING - vocab_size: 49408 05/19/2022 19:32:04 - WARNING - transformer_width: 512 05/19/2022 19:32:04 - WARNING - transformer_heads: 8 05/19/2022 19:32:04 - WARNING - transformer_layers: 12 05/19/2022 19:32:04 - WARNING - linear_patch: 2d 05/19/2022 19:32:04 - WARNING - cut_top_layer: 0 05/19/2022 19:32:05 - WARNING - sim_header: meanP 05/19/2022 19:32:10 - INFO - -------------------- 05/19/2022 19:32:10 - INFO - Weights from pretrained model not used in CLIP4Clip: clip.input_resolution clip.context_length clip.vocab_size 05/19/2022 19:32:10 - INFO - Running test 05/19/2022 19:32:10 - INFO - Num examples = 1000 05/19/2022 19:32:10 - INFO - Batch size = 16 05/19/2022 19:32:10 - INFO - Num steps = 63 05/19/2022 19:32:10 - INFO - Running val 05/19/2022 19:32:10 - INFO - Num examples = 1000 05/19/2022 19:32:20 - INFO - Running training 05/19/2022 19:32:20 - INFO - Num examples = 180000 05/19/2022 19:32:20 - INFO - Batch size = 128 05/19/2022 19:32:20 - INFO - Num steps = 14060 Traceback (most recent call last): File "main_task_retrieval.py", line 582, in main() File "main_task_retrieval.py", line 556, in main scheduler, global_step, local_rank=args.local_rank) File "main_task_retrieval.py", line 260, in train_epoch for step, batch in enumerate(train_dataloader): File "/home/nccu/anaconda3/envs/Clip4Video-alvin/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 435, in next data = self._next_data() File "/home/nccu/anaconda3/envs/Clip4Video-alvin/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 475, in _next_data data = self._dataset_fetcher.fetch(index) # may raise StopIteration File "/home/nccu/anaconda3/envs/Clip4Video-alvin/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/nccu/anaconda3/envs/Clip4Video-alvin/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/nccu/Alvin Lin/CLIP4Clip/dataloaders/dataloader_msrvtt_retrieval.py", line 299, in getitem video, video_mask = self._get_rawvideo(choice_video_ids) File "/home/nccu/Alvin Lin/CLIP4Clip/dataloaders/dataloader_msrvtt_retrieval.py", line 260, in _get_rawvideo raw_video_data = self.rawVideoExtractor.get_video_data(video_path) File "/home/nccu/Alvin Lin/CLIP4Clip/dataloaders/rawvideo_util.py", line 79, in get_video_data image_input = self.video_to_tensor(video_path, self.transform, sample_fp=self.framerate, start_time=start_time, end_time=end_time) File "/home/nccu/Alvin Lin/CLIP4Clip/dataloaders/rawvideo_util.py", line 36, in video_to_tensor total_duration = (frameCount + fps - 1) // fps ZeroDivisionError: integer division or modulo by zero Traceback (most recent call last): File "/home/nccu/anaconda3/envs/Clip4Video-alvin/lib/python3.7/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/home/nccu/anaconda3/envs/Clip4Video-alvin/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/nccu/anaconda3/envs/Clip4Video-alvin/lib/python3.7/site-packages/torch/distributed/launch.py", line 260, in main() File "/home/nccu/anaconda3/envs/Clip4Video-alvin/lib/python3.7/site-packages/torch/distributed/launch.py", line 256, in main cmd=cmd) subprocess.CalledProcessError: Command '['/home/nccu/anaconda3/envs/Clip4Video-alvin/bin/python', '-u', 'main_task_retrieval.py', '--local_rank=0', '--do_train', '--num_thread_reader=0', '--epochs=5', '--batch_size=128', '--n_display=50', '--train_csv', './MSRVTT/videos/MSRVTT_train.9k.csv', '--val_csv', './MSRVTT/videos/MSRVTT_JSFUSION_test.csv', '--data_path', './MSRVTT/videos/MSRVTT_data.json', '--features_path', './MSRVTT/videos/MSRVTT_Videos', '--output_dir', 'ckpts/ckpt_msrvtt_retrieval_looseType', '--lr', '1e-4', '--max_words', '32', '--max_frames', '12', '--batch_size_val', '16', '--datatype', 'msrvtt', '--expand_msrvtt_sentences', '--feature_framerate', '1', '--coef_lr', '1e-3', '--freeze_layer_num', '0', '--slice_framepos', '2', '--loose_type', '--linear_patch', '2d', '--sim_header', 'meanP', '--pretrained_clip_name', 'ViT-B/32']' returned non-zero exit status 1.

Btw, about the msrvtt dataset pwd, am I did the right way? Below is my directory tree:

    CLIP4Clip
    |-- ......
    |--MSRVTT
            |--annotation
            |--hight-quality
            |--structured-symlinks
            |--videos
                    |--MSRVTT_data.json
                    |--MSRVTT_JSFUSION_test.csv
                    |--MSRVTT_train.7k.csv
                    |--MSRVTT_train.9k.csv
                    |--test_list_new.txt
                    |--train_list_new.txt
                    |--all
                        |--video0
                        |-- ......
                        |--video9999

Thanks for your reply. I appreciate your help very much.

ArrowLuo commented 2 years ago

Hi @ALVIN-SMITH, your video path is not right, which causes the error of ZeroDivisionError: integer division or modulo by zero. The features_path should be the path containing all videos (not folders). Thus the video path can be found as video_path = os.path.join(self.features_path, "{}.mp4".format(video_id)) or here. The video_id from MSRVTT_data.json should match the video name. So you'd better re-organize the folder all and put all videos in one folder, e.g., all_videos. The command may look like mkdir all_videos & cp all/*/*.mp4 all_videos/. Best~

alvinlin1271320 commented 2 years ago

Thank you so much! I can train now. I owe you big time, have a nice weekend. :D

wangyu0303 commented 2 years ago

Hello,I wanna train CLIP4Clip on MSRVTT. But I got this issue. Can you Help me?

subprocess.CalledProcessError: Command '['/home/ubuntu/studentAssign/wangyu/anaconda3/envs/ct/bin/python', '-u', 'main_task_retrieval.py', '--local_rank=0', '--do_train', '--num_thread_reader=0', '--epochs=1', '--batch_size=2', '--n_display=50', '--train_csv', 'MSRVTT/MSRVTT_train.9k.csv', '--val_csv', 'MSRVTT/MSRVTT_JSFUSION_test.csv', '--data_path', 'MSRVTT/MSRVTT_data.json', '--features_path', 'all_videos', '--output_dir', 'ckpts/ckpt_msrvtt_retrieval_looseType', '--lr', '1e-4', '--max_words', '32', '--max_frames', '12', '--batch_size_val', '16', '--datatype', 'msrvtt', '--expand_msrvtt_sentences', '--feature_framerate', '1', '--coef_lr', '1e-3', '--freeze_layer_num', '0', '--slice_framepos', '2', '--loose_type', '--linear_patch', '2d', '--sim_header', 'meanP', '--pretrained_clip_name', 'ViT-B/32']' returned non-zero exit status 1.

Below is my directory tree.

  1. CLIP4Clip
  2. |--MSRVTT
  3. |--annotation
  4. |--hight-quality
  5. |--structured-symlinks
  6. |--MSRVTT_data.json
  7. |--MSRVTT_JSFUSION_test.csv
  8. |--MSRVTT_train.7k.csv
  9. |--MSRVTT_train.9k.csv
  10. |--test_list_new.txt
  11. |--train_list_new.txt
  12. |--all_videos
  13. |--video0
  14. |-- ......
  15. |--video9999

    Below is my log.

    python -m torch.distributed.launch --nproc_per_node=1 \ main_task_retrieval.py --do_train --num_thread_reader=0\ --epochs=1 --batch_size=2 --n_display=50 \ --train_csv MSRVTT/MSRVTT_train.9k.csv \ --val_csv MSRVTT/MSRVTT_JSFUSION_test.csv \ --data_path MSRVTT/MSRVTT_data.json \ --features_path all_videos \ --output_dir ckpts/ckpt_msrvtt_retrieval_looseType \ --lr 1e-4 --max_words 32 --max_frames 12 --batch_size_val 16 \ -freeze_layer_num 0 --slice_framepos 2 \ --loose_type --linear_patch 2d --sim_header meanP \ --pretrained_clip_name ViT-B/32> --datatype msrvtt --expand_msrvtt_sentences \ --feature_framerate 1 --coef_lr 1e-3 \ --freeze_layer_num 0 --slice_framepos 2 \ --loose_type --linear_patch 2d --sim_header meanP \ --pretrained_clip_name ViT-B/32

06/04/2022 17:09:16 - INFO - Effective parameters: 06/04/2022 17:09:16 - INFO - <<< batch_size: 2 06/04/2022 17:09:16 - INFO - <<< batch_size_val: 16 06/04/2022 17:09:16 - INFO - <<< cache_dir: 06/04/2022 17:09:16 - INFO - <<< coef_lr: 0.001 06/04/2022 17:09:16 - INFO - <<< cross_model: cross-base 06/04/2022 17:09:16 - INFO - <<< cross_num_hidden_layers: 4 06/04/2022 17:09:16 - INFO - <<< data_path: MSRVTT/MSRVTT_data.json 06/04/2022 17:09:16 - INFO - <<< datatype: msrvtt 06/04/2022 17:09:16 - INFO - <<< do_eval: False 06/04/2022 17:09:16 - INFO - <<< do_lower_case: False 06/04/2022 17:09:16 - INFO - <<< do_pretrain: False 06/04/2022 17:09:16 - INFO - <<< do_train: True 06/04/2022 17:09:16 - INFO - <<< epochs: 1 06/04/2022 17:09:16 - INFO - <<< eval_frame_order: 0 06/04/2022 17:09:16 - INFO - <<< expand_msrvtt_sentences: True 06/04/2022 17:09:16 - INFO - <<< feature_framerate: 1 06/04/2022 17:09:16 - INFO - <<< features_path: all_videos 06/04/2022 17:09:16 - INFO - <<< fp16: False 06/04/2022 17:09:16 - INFO - <<< fp16_opt_level: O1 06/04/2022 17:09:16 - INFO - <<< freeze_layer_num: 0 06/04/2022 17:09:16 - INFO - <<< gradient_accumulation_steps: 1 06/04/2022 17:09:16 - INFO - <<< hard_negative_rate: 0.5 06/04/2022 17:09:16 - INFO - <<< init_model: None 06/04/2022 17:09:16 - INFO - <<< linear_patch: 2d 06/04/2022 17:09:16 - INFO - <<< local_rank: 0 06/04/2022 17:09:16 - INFO - <<< loose_type: True 06/04/2022 17:09:16 - INFO - <<< lr: 0.0001 06/04/2022 17:09:16 - INFO - <<< lr_decay: 0.9 06/04/2022 17:09:16 - INFO - <<< margin: 0.1 06/04/2022 17:09:16 - INFO - <<< max_frames: 12 06/04/2022 17:09:16 - INFO - <<< max_words: 32 06/04/2022 17:09:16 - INFO - <<< n_display: 50 06/04/2022 17:09:16 - INFO - <<< n_gpu: 1 06/04/2022 17:09:16 - INFO - <<< n_pair: 1 06/04/2022 17:09:16 - INFO - <<< negative_weighting: 1 06/04/2022 17:09:16 - INFO - <<< num_thread_reader: 0 06/04/2022 17:09:16 - INFO - <<< output_dir: ckpts/ckpt_msrvtt_retrieval_looseType 06/04/2022 17:09:16 - INFO - <<< pretrained_clip_name: ViT-B/32 06/04/2022 17:09:16 - INFO - <<< rank: 0 06/04/2022 17:09:16 - INFO - <<< resume_model: None 06/04/2022 17:09:16 - INFO - <<< sampled_use_mil: False 06/04/2022 17:09:16 - INFO - <<< seed: 42 06/04/2022 17:09:16 - INFO - <<< sim_header: meanP 06/04/2022 17:09:16 - INFO - <<< slice_framepos: 2 06/04/2022 17:09:16 - INFO - <<< task_type: retrieval 06/04/2022 17:09:16 - INFO - <<< text_num_hidden_layers: 12 06/04/2022 17:09:16 - INFO - <<< train_csv: MSRVTT/MSRVTT_train.9k.csv 06/04/2022 17:09:16 - INFO - <<< train_frame_order: 0 06/04/2022 17:09:16 - INFO - <<< use_mil: False 06/04/2022 17:09:16 - INFO - <<< val_csv: MSRVTT/MSRVTT_JSFUSION_test.csv 06/04/2022 17:09:16 - INFO - <<< video_dim: 1024 06/04/2022 17:09:16 - INFO - <<< visual_num_hidden_layers: 12 06/04/2022 17:09:16 - INFO - <<< warmup_proportion: 0.1 06/04/2022 17:09:16 - INFO - <<< world_size: 1 06/04/2022 17:09:16 - INFO - device: cuda:0 n_gpu: 1 06/04/2022 17:09:17 - INFO - loading archive file /home/ubuntu/studentAssign/wangyu/CLIP4Clip-master/modules/cross-base 06/04/2022 17:09:17 - INFO - Model config { "attention_probs_dropout_prob": 0.1, "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 512, "initializer_range": 0.02, "intermediate_size": 2048, "max_position_embeddings": 128, "num_attention_heads": 8, "num_hidden_layers": 4, "type_vocab_size": 2, "vocab_size": 512 }

06/04/2022 17:09:17 - INFO - Weight doesn't exsits. /home/ubuntu/studentAssign/wangyu/CLIP4Clip-master/modules/cross-base/cross_pytorch_model.bin 06/04/2022 17:09:17 - WARNING - Stage-One:True, Stage-Two:False 06/04/2022 17:09:17 - WARNING - Test retrieval by loose type. 06/04/2022 17:09:17 - WARNING - embed_dim: 512 06/04/2022 17:09:17 - WARNING - image_resolution: 224 06/04/2022 17:09:17 - WARNING - vision_layers: 12 06/04/2022 17:09:17 - WARNING - vision_width: 768 06/04/2022 17:09:17 - WARNING - vision_patch_size: 32 06/04/2022 17:09:17 - WARNING - context_length: 77 06/04/2022 17:09:17 - WARNING - vocab_size: 49408 06/04/2022 17:09:17 - WARNING - transformer_width: 512 06/04/2022 17:09:17 - WARNING - transformer_heads: 8 06/04/2022 17:09:17 - WARNING - transformer_layers: 12 06/04/2022 17:09:17 - WARNING - linear_patch: 2d 06/04/2022 17:09:17 - WARNING - cut_top_layer: 0 06/04/2022 17:09:21 - WARNING - sim_header: meanP 06/04/2022 17:09:32 - INFO - -------------------- 06/04/2022 17:09:32 - INFO - Weights from pretrained model not used in CLIP4Clip: clip.input_resolution clip.context_length clip.vocab_size 06/04/2022 17:09:32 - INFO - Running test 06/04/2022 17:09:32 - INFO - Num examples = 1000 06/04/2022 17:09:32 - INFO - Batch size = 16 06/04/2022 17:09:32 - INFO - Num steps = 63 06/04/2022 17:09:32 - INFO - Running val 06/04/2022 17:09:32 - INFO - Num examples = 1000 06/04/2022 17:10:12 - INFO - Running training 06/04/2022 17:10:12 - INFO - Num examples = 180000 06/04/2022 17:10:12 - INFO - Batch size = 2 06/04/2022 17:10:12 - INFO - Num steps = 90000 VIDIOC_REQBUFS: Inappropriate ioctl for device Traceback (most recent call last): File "main_task_retrieval.py", line 583, in main() File "main_task_retrieval.py", line 556, in main scheduler, global_step, local_rank=args.local_rank) File "main_task_retrieval.py", line 260, in train_epoch for step, batch in enumerate(train_dataloader): File "/home/ubuntu/studentAssign/wangyu/anaconda3/envs/ct/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 435, in next data = self._next_data() File "/home/ubuntu/studentAssign/wangyu/anaconda3/envs/ct/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 475, in _next_data data = self._dataset_fetcher.fetch(index) # may raise StopIteration File "/home/ubuntu/studentAssign/wangyu/anaconda3/envs/ct/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/ubuntu/studentAssign/wangyu/anaconda3/envs/ct/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/ubuntu/studentAssign/wangyu/CLIP4Clip-master/dataloaders/dataloader_msrvtt_retrieval.py", line 299, in getitem video, video_mask = self._get_rawvideo(choice_video_ids) File "/home/ubuntu/studentAssign/wangyu/CLIP4Clip-master/dataloaders/dataloader_msrvtt_retrieval.py", line 260, in _get_rawvideo raw_video_data = self.rawVideoExtractor.get_video_data(video_path) File "/home/ubuntu/studentAssign/wangyu/CLIP4Clip-master/dataloaders/rawvideo_util.py", line 76, in get_video_data image_input = self.video_to_tensor(video_path, self.transform, sample_fp=self.framerate, start_time=start_time, end_time=end_time) File "/home/ubuntu/studentAssign/wangyu/CLIP4Clip-master/dataloaders/rawvideo_util.py", line 36, in video_to_tensor total_duration = (frameCount + fps - 1) // fps ZeroDivisionError: integer division or modulo by zero Traceback (most recent call last): File "/home/ubuntu/studentAssign/wangyu/anaconda3/envs/ct/lib/python3.6/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/home/ubuntu/studentAssign/wangyu/anaconda3/envs/ct/lib/python3.6/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/ubuntu/studentAssign/wangyu/anaconda3/envs/ct/lib/python3.6/site-packages/torch/distributed/launch.py", line 260, in main() File "/home/ubuntu/studentAssign/wangyu/anaconda3/envs/ct/lib/python3.6/site-packages/torch/distributed/launch.py", line 256, in main cmd=cmd) subprocess.CalledProcessError: Command '['/home/ubuntu/studentAssign/wangyu/anaconda3/envs/ct/bin/python', '-u', 'main_task_retrieval.py', '--local_rank=0', '--do_train', '--num_thread_reader=0', '--epochs=1', '--batch_size=2', '--n_display=50', '--train_csv', 'MSRVTT/MSRVTT_train.9k.csv', '--val_csv', 'MSRVTT/MSRVTT_JSFUSION_test.csv', '--data_path', 'MSRVTT/MSRVTT_data.json', '--features_path', 'all_videos', '--output_dir', 'ckpts/ckpt_msrvtt_retrieval_looseType', '--lr', '1e-4', '--max_words', '32', '--max_frames', '12', '--batch_size_val', '16', '--datatype', 'msrvtt', '--expand_msrvtt_sentences', '--feature_framerate', '1', '--coef_lr', '1e-3', '--freeze_layer_num', '0', '--slice_framepos', '2', '--loose_type', '--linear_patch', '2d', '--sim_header', 'meanP', '--pretrained_clip_name', 'ViT-B/32']' returned non-zero exit status 1.

I will appreciate your help with this situation. Thank you in advance.

ArrowLuo commented 2 years ago

Hi @wangyu0303, the folder all_videos should not contain any subfolders. In other words, you should re-organize the folder all_videos, and put all video files under it.

Hi @ALVIN-SMITH, your video path is not right, which causes the error of ZeroDivisionError: integer division or modulo by zero. The features_path should be the path containing all videos (not folders). Thus the video path can be found as video_path = os.path.join(self.features_path, "{}.mp4".format(video_id)) or here. The video_id from MSRVTT_data.json should match the video name. So you'd better re-organize the folder all and put all videos in one folder, e.g., all_videos. The command may look like mkdir all_videos & cp all/*/*.mp4 all_videos/. Best~

wangyu0303 commented 2 years ago

Thank you for your reply. But the folder all_videos have not any subfolders.

Below is my directory tree.

CLIP4Clip
|-- ......
|--MSRVTT
        |--annotation
        |--hight-quality
        |--structured-symlinks
        |--MSRVTT_data.json
        |--MSRVTT_JSFUSION_test.csv
        |--MSRVTT_train.7k.csv
        |--MSRVTT_train.9k.csv
        |--test_list_new.txt
        |--train_list_new.txt

 |--all_videos
        |--video0
        |-- ......
        |--video9999

Thanks for your reply. I appreciate your help very much. Could you show me your directory tree?

ArrowLuo commented 2 years ago

Hi @wangyu0303, where is your .mp4? Why the file is not video0.mp4, ..., video9999.mp4?

wangyu0303 commented 2 years ago

Thank you so much! I can train now. I owe you big time, :)