Closed alvinlin1271320 closed 2 years ago
Hi @ALVIN-SMITH, the log is incomplete, and I can not judge the problem base on it. Can you print all the information?
Of course!
python -m torch.distributed.launch --nproc_per_node=1 main_task_retrieval.py --do_train --num_thread_reader=0 --epochs=5 --batch_size=128 --n_display=50 --train_csv ${DATA_PATH}/MSRVTT_train.9k.csv --val_csv ${DATA_PATH}/MSRVTT_JSFUSION_test.csv --data_path ${DATA_PATH}/MSRVTT_data.json --features_path ${DATA_PATH}/MSRVTT_Videos --output_dir ckpts/ckpt_msrvtt_retrieval_looseType --lr 1e-4 --max_words 32 --max_frames 12 --batch_size_val 16 --datatype msrvtt --expand_msrvtt_sentences --feature_framerate 1 --coef_lr 1e-3 --freeze_layer_num 0 --slice_framepos 2 --loose_type --linear_patch 2d --sim_header meanP --pretrained_clip_name ViT-B/32
05/19/2022 19:32:03 - INFO - Effective parameters: 05/19/2022 19:32:03 - INFO - <<< batch_size: 128 05/19/2022 19:32:03 - INFO - <<< batch_size_val: 16 05/19/2022 19:32:03 - INFO - <<< cache_dir: 05/19/2022 19:32:03 - INFO - <<< coef_lr: 0.001 05/19/2022 19:32:03 - INFO - <<< cross_model: cross-base 05/19/2022 19:32:03 - INFO - <<< cross_num_hidden_layers: 4 05/19/2022 19:32:03 - INFO - <<< data_path: ./MSRVTT/videos/MSRVTT_data.json 05/19/2022 19:32:03 - INFO - <<< datatype: msrvtt 05/19/2022 19:32:03 - INFO - <<< do_eval: False 05/19/2022 19:32:03 - INFO - <<< do_lower_case: False 05/19/2022 19:32:03 - INFO - <<< do_pretrain: False 05/19/2022 19:32:03 - INFO - <<< do_train: True 05/19/2022 19:32:03 - INFO - <<< epochs: 5 05/19/2022 19:32:03 - INFO - <<< eval_frame_order: 0 05/19/2022 19:32:03 - INFO - <<< expand_msrvtt_sentences: True 05/19/2022 19:32:03 - INFO - <<< feature_framerate: 1 05/19/2022 19:32:03 - INFO - <<< features_path: ./MSRVTT/videos/MSRVTT_Videos 05/19/2022 19:32:03 - INFO - <<< fp16: False 05/19/2022 19:32:03 - INFO - <<< fp16_opt_level: O1 05/19/2022 19:32:03 - INFO - <<< freeze_layer_num: 0 05/19/2022 19:32:03 - INFO - <<< gradient_accumulation_steps: 1 05/19/2022 19:32:03 - INFO - <<< hard_negative_rate: 0.5 05/19/2022 19:32:03 - INFO - <<< init_model: None 05/19/2022 19:32:03 - INFO - <<< linear_patch: 2d 05/19/2022 19:32:03 - INFO - <<< local_rank: 0 05/19/2022 19:32:03 - INFO - <<< loose_type: True 05/19/2022 19:32:03 - INFO - <<< lr: 0.0001 05/19/2022 19:32:03 - INFO - <<< lr_decay: 0.9 05/19/2022 19:32:03 - INFO - <<< margin: 0.1 05/19/2022 19:32:03 - INFO - <<< max_frames: 12 05/19/2022 19:32:03 - INFO - <<< max_words: 32 05/19/2022 19:32:03 - INFO - <<< n_display: 50 05/19/2022 19:32:03 - INFO - <<< n_gpu: 1 05/19/2022 19:32:03 - INFO - <<< n_pair: 1 05/19/2022 19:32:03 - INFO - <<< negative_weighting: 1 05/19/2022 19:32:03 - INFO - <<< num_thread_reader: 0 05/19/2022 19:32:03 - INFO - <<< output_dir: ckpts/ckpt_msrvtt_retrieval_looseType 05/19/2022 19:32:03 - INFO - <<< pretrained_clip_name: ViT-B/32 05/19/2022 19:32:03 - INFO - <<< rank: 0 05/19/2022 19:32:03 - INFO - <<< resume_model: None 05/19/2022 19:32:03 - INFO - <<< sampled_use_mil: False 05/19/2022 19:32:03 - INFO - <<< seed: 42 05/19/2022 19:32:03 - INFO - <<< sim_header: meanP 05/19/2022 19:32:03 - INFO - <<< slice_framepos: 2 05/19/2022 19:32:03 - INFO - <<< task_type: retrieval 05/19/2022 19:32:03 - INFO - <<< text_num_hidden_layers: 12 05/19/2022 19:32:03 - INFO - <<< train_csv: ./MSRVTT/videos/MSRVTT_train.9k.csv 05/19/2022 19:32:03 - INFO - <<< train_frame_order: 0 05/19/2022 19:32:03 - INFO - <<< use_mil: False 05/19/2022 19:32:03 - INFO - <<< val_csv: ./MSRVTT/videos/MSRVTT_JSFUSION_test.csv 05/19/2022 19:32:03 - INFO - <<< video_dim: 1024 05/19/2022 19:32:03 - INFO - <<< visual_num_hidden_layers: 12 05/19/2022 19:32:03 - INFO - <<< warmup_proportion: 0.1 05/19/2022 19:32:03 - INFO - <<< world_size: 1 05/19/2022 19:32:03 - INFO - device: cuda:0 n_gpu: 2 05/19/2022 19:32:03 - INFO - loading archive file /home/nccu/Alvin Lin/CLIP4Clip/modules/cross-base 05/19/2022 19:32:03 - INFO - Model config { "attention_probs_dropout_prob": 0.1, "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 512, "initializer_range": 0.02, "intermediate_size": 2048, "max_position_embeddings": 128, "num_attention_heads": 8, "num_hidden_layers": 4, "type_vocab_size": 2, "vocab_size": 512 }
05/19/2022 19:32:04 - WARNING - Stage-One:True, Stage-Two:False
05/19/2022 19:32:04 - WARNING - Test retrieval by loose type.
05/19/2022 19:32:04 - WARNING - embed_dim: 512
05/19/2022 19:32:04 - WARNING - image_resolution: 224
05/19/2022 19:32:04 - WARNING - vision_layers: 12
05/19/2022 19:32:04 - WARNING - vision_width: 768
05/19/2022 19:32:04 - WARNING - vision_patch_size: 32
05/19/2022 19:32:04 - WARNING - context_length: 77
05/19/2022 19:32:04 - WARNING - vocab_size: 49408
05/19/2022 19:32:04 - WARNING - transformer_width: 512
05/19/2022 19:32:04 - WARNING - transformer_heads: 8
05/19/2022 19:32:04 - WARNING - transformer_layers: 12
05/19/2022 19:32:04 - WARNING - linear_patch: 2d
05/19/2022 19:32:04 - WARNING - cut_top_layer: 0
05/19/2022 19:32:05 - WARNING - sim_header: meanP
05/19/2022 19:32:10 - INFO - --------------------
05/19/2022 19:32:10 - INFO - Weights from pretrained model not used in CLIP4Clip:
clip.input_resolution
clip.context_length
clip.vocab_size
05/19/2022 19:32:10 - INFO - Running test
05/19/2022 19:32:10 - INFO - Num examples = 1000
05/19/2022 19:32:10 - INFO - Batch size = 16
05/19/2022 19:32:10 - INFO - Num steps = 63
05/19/2022 19:32:10 - INFO - Running val
05/19/2022 19:32:10 - INFO - Num examples = 1000
05/19/2022 19:32:20 - INFO - Running training
05/19/2022 19:32:20 - INFO - Num examples = 180000
05/19/2022 19:32:20 - INFO - Batch size = 128
05/19/2022 19:32:20 - INFO - Num steps = 14060
Traceback (most recent call last):
File "main_task_retrieval.py", line 582, in
Btw, about the msrvtt dataset pwd, am I did the right way? Below is my directory tree:
CLIP4Clip
|-- ......
|--MSRVTT
|--annotation
|--hight-quality
|--structured-symlinks
|--videos
|--MSRVTT_data.json
|--MSRVTT_JSFUSION_test.csv
|--MSRVTT_train.7k.csv
|--MSRVTT_train.9k.csv
|--test_list_new.txt
|--train_list_new.txt
|--all
|--video0
|-- ......
|--video9999
Thanks for your reply. I appreciate your help very much.
Hi @ALVIN-SMITH, your video path is not right, which causes the error of ZeroDivisionError: integer division or modulo by zero
. The features_path
should be the path containing all videos (not folders). Thus the video path can be found as video_path = os.path.join(self.features_path, "{}.mp4".format(video_id))
or here. The video_id
from MSRVTT_data.json
should match the video name. So you'd better re-organize the folder all
and put all videos in one folder, e.g., all_videos. The command may look like mkdir all_videos & cp all/*/*.mp4 all_videos/
. Best~
Thank you so much! I can train now. I owe you big time, have a nice weekend. :D
Hello,I wanna train CLIP4Clip on MSRVTT. But I got this issue. Can you Help me?
subprocess.CalledProcessError: Command '['/home/ubuntu/studentAssign/wangyu/anaconda3/envs/ct/bin/python', '-u', 'main_task_retrieval.py', '--local_rank=0', '--do_train', '--num_thread_reader=0', '--epochs=1', '--batch_size=2', '--n_display=50', '--train_csv', 'MSRVTT/MSRVTT_train.9k.csv', '--val_csv', 'MSRVTT/MSRVTT_JSFUSION_test.csv', '--data_path', 'MSRVTT/MSRVTT_data.json', '--features_path', 'all_videos', '--output_dir', 'ckpts/ckpt_msrvtt_retrieval_looseType', '--lr', '1e-4', '--max_words', '32', '--max_frames', '12', '--batch_size_val', '16', '--datatype', 'msrvtt', '--expand_msrvtt_sentences', '--feature_framerate', '1', '--coef_lr', '1e-3', '--freeze_layer_num', '0', '--slice_framepos', '2', '--loose_type', '--linear_patch', '2d', '--sim_header', 'meanP', '--pretrained_clip_name', 'ViT-B/32']' returned non-zero exit status 1.
Below is my directory tree.
|--video9999
Below is my log.
python -m torch.distributed.launch --nproc_per_node=1 \ main_task_retrieval.py --do_train --num_thread_reader=0\ --epochs=1 --batch_size=2 --n_display=50 \ --train_csv MSRVTT/MSRVTT_train.9k.csv \ --val_csv MSRVTT/MSRVTT_JSFUSION_test.csv \ --data_path MSRVTT/MSRVTT_data.json \ --features_path all_videos \ --output_dir ckpts/ckpt_msrvtt_retrieval_looseType \ --lr 1e-4 --max_words 32 --max_frames 12 --batch_size_val 16 \ -freeze_layer_num 0 --slice_framepos 2 \ --loose_type --linear_patch 2d --sim_header meanP \ --pretrained_clip_name ViT-B/32> --datatype msrvtt --expand_msrvtt_sentences \ --feature_framerate 1 --coef_lr 1e-3 \ --freeze_layer_num 0 --slice_framepos 2 \ --loose_type --linear_patch 2d --sim_header meanP \ --pretrained_clip_name ViT-B/32
06/04/2022 17:09:16 - INFO - Effective parameters: 06/04/2022 17:09:16 - INFO - <<< batch_size: 2 06/04/2022 17:09:16 - INFO - <<< batch_size_val: 16 06/04/2022 17:09:16 - INFO - <<< cache_dir: 06/04/2022 17:09:16 - INFO - <<< coef_lr: 0.001 06/04/2022 17:09:16 - INFO - <<< cross_model: cross-base 06/04/2022 17:09:16 - INFO - <<< cross_num_hidden_layers: 4 06/04/2022 17:09:16 - INFO - <<< data_path: MSRVTT/MSRVTT_data.json 06/04/2022 17:09:16 - INFO - <<< datatype: msrvtt 06/04/2022 17:09:16 - INFO - <<< do_eval: False 06/04/2022 17:09:16 - INFO - <<< do_lower_case: False 06/04/2022 17:09:16 - INFO - <<< do_pretrain: False 06/04/2022 17:09:16 - INFO - <<< do_train: True 06/04/2022 17:09:16 - INFO - <<< epochs: 1 06/04/2022 17:09:16 - INFO - <<< eval_frame_order: 0 06/04/2022 17:09:16 - INFO - <<< expand_msrvtt_sentences: True 06/04/2022 17:09:16 - INFO - <<< feature_framerate: 1 06/04/2022 17:09:16 - INFO - <<< features_path: all_videos 06/04/2022 17:09:16 - INFO - <<< fp16: False 06/04/2022 17:09:16 - INFO - <<< fp16_opt_level: O1 06/04/2022 17:09:16 - INFO - <<< freeze_layer_num: 0 06/04/2022 17:09:16 - INFO - <<< gradient_accumulation_steps: 1 06/04/2022 17:09:16 - INFO - <<< hard_negative_rate: 0.5 06/04/2022 17:09:16 - INFO - <<< init_model: None 06/04/2022 17:09:16 - INFO - <<< linear_patch: 2d 06/04/2022 17:09:16 - INFO - <<< local_rank: 0 06/04/2022 17:09:16 - INFO - <<< loose_type: True 06/04/2022 17:09:16 - INFO - <<< lr: 0.0001 06/04/2022 17:09:16 - INFO - <<< lr_decay: 0.9 06/04/2022 17:09:16 - INFO - <<< margin: 0.1 06/04/2022 17:09:16 - INFO - <<< max_frames: 12 06/04/2022 17:09:16 - INFO - <<< max_words: 32 06/04/2022 17:09:16 - INFO - <<< n_display: 50 06/04/2022 17:09:16 - INFO - <<< n_gpu: 1 06/04/2022 17:09:16 - INFO - <<< n_pair: 1 06/04/2022 17:09:16 - INFO - <<< negative_weighting: 1 06/04/2022 17:09:16 - INFO - <<< num_thread_reader: 0 06/04/2022 17:09:16 - INFO - <<< output_dir: ckpts/ckpt_msrvtt_retrieval_looseType 06/04/2022 17:09:16 - INFO - <<< pretrained_clip_name: ViT-B/32 06/04/2022 17:09:16 - INFO - <<< rank: 0 06/04/2022 17:09:16 - INFO - <<< resume_model: None 06/04/2022 17:09:16 - INFO - <<< sampled_use_mil: False 06/04/2022 17:09:16 - INFO - <<< seed: 42 06/04/2022 17:09:16 - INFO - <<< sim_header: meanP 06/04/2022 17:09:16 - INFO - <<< slice_framepos: 2 06/04/2022 17:09:16 - INFO - <<< task_type: retrieval 06/04/2022 17:09:16 - INFO - <<< text_num_hidden_layers: 12 06/04/2022 17:09:16 - INFO - <<< train_csv: MSRVTT/MSRVTT_train.9k.csv 06/04/2022 17:09:16 - INFO - <<< train_frame_order: 0 06/04/2022 17:09:16 - INFO - <<< use_mil: False 06/04/2022 17:09:16 - INFO - <<< val_csv: MSRVTT/MSRVTT_JSFUSION_test.csv 06/04/2022 17:09:16 - INFO - <<< video_dim: 1024 06/04/2022 17:09:16 - INFO - <<< visual_num_hidden_layers: 12 06/04/2022 17:09:16 - INFO - <<< warmup_proportion: 0.1 06/04/2022 17:09:16 - INFO - <<< world_size: 1 06/04/2022 17:09:16 - INFO - device: cuda:0 n_gpu: 1 06/04/2022 17:09:17 - INFO - loading archive file /home/ubuntu/studentAssign/wangyu/CLIP4Clip-master/modules/cross-base 06/04/2022 17:09:17 - INFO - Model config { "attention_probs_dropout_prob": 0.1, "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 512, "initializer_range": 0.02, "intermediate_size": 2048, "max_position_embeddings": 128, "num_attention_heads": 8, "num_hidden_layers": 4, "type_vocab_size": 2, "vocab_size": 512 }
06/04/2022 17:09:17 - INFO - Weight doesn't exsits. /home/ubuntu/studentAssign/wangyu/CLIP4Clip-master/modules/cross-base/cross_pytorch_model.bin
06/04/2022 17:09:17 - WARNING - Stage-One:True, Stage-Two:False
06/04/2022 17:09:17 - WARNING - Test retrieval by loose type.
06/04/2022 17:09:17 - WARNING - embed_dim: 512
06/04/2022 17:09:17 - WARNING - image_resolution: 224
06/04/2022 17:09:17 - WARNING - vision_layers: 12
06/04/2022 17:09:17 - WARNING - vision_width: 768
06/04/2022 17:09:17 - WARNING - vision_patch_size: 32
06/04/2022 17:09:17 - WARNING - context_length: 77
06/04/2022 17:09:17 - WARNING - vocab_size: 49408
06/04/2022 17:09:17 - WARNING - transformer_width: 512
06/04/2022 17:09:17 - WARNING - transformer_heads: 8
06/04/2022 17:09:17 - WARNING - transformer_layers: 12
06/04/2022 17:09:17 - WARNING - linear_patch: 2d
06/04/2022 17:09:17 - WARNING - cut_top_layer: 0
06/04/2022 17:09:21 - WARNING - sim_header: meanP
06/04/2022 17:09:32 - INFO - --------------------
06/04/2022 17:09:32 - INFO - Weights from pretrained model not used in CLIP4Clip:
clip.input_resolution
clip.context_length
clip.vocab_size
06/04/2022 17:09:32 - INFO - Running test
06/04/2022 17:09:32 - INFO - Num examples = 1000
06/04/2022 17:09:32 - INFO - Batch size = 16
06/04/2022 17:09:32 - INFO - Num steps = 63
06/04/2022 17:09:32 - INFO - Running val
06/04/2022 17:09:32 - INFO - Num examples = 1000
06/04/2022 17:10:12 - INFO - Running training
06/04/2022 17:10:12 - INFO - Num examples = 180000
06/04/2022 17:10:12 - INFO - Batch size = 2
06/04/2022 17:10:12 - INFO - Num steps = 90000
VIDIOC_REQBUFS: Inappropriate ioctl for device
Traceback (most recent call last):
File "main_task_retrieval.py", line 583, in
I will appreciate your help with this situation. Thank you in advance.
Hi @wangyu0303, the folder all_videos
should not contain any subfolders. In other words, you should re-organize the folder all_videos
, and put all video files under it.
Hi @ALVIN-SMITH, your video path is not right, which causes the error of
ZeroDivisionError: integer division or modulo by zero
. Thefeatures_path
should be the path containing all videos (not folders). Thus the video path can be found asvideo_path = os.path.join(self.features_path, "{}.mp4".format(video_id))
or here. Thevideo_id
fromMSRVTT_data.json
should match the video name. So you'd better re-organize the folderall
and put all videos in one folder, e.g., all_videos. The command may look likemkdir all_videos & cp all/*/*.mp4 all_videos/
. Best~
Thank you for your reply. But the folder all_videos have not any subfolders.
Below is my directory tree.
CLIP4Clip
|-- ......
|--MSRVTT
|--annotation
|--hight-quality
|--structured-symlinks
|--MSRVTT_data.json
|--MSRVTT_JSFUSION_test.csv
|--MSRVTT_train.7k.csv
|--MSRVTT_train.9k.csv
|--test_list_new.txt
|--train_list_new.txt
|--all_videos
|--video0
|-- ......
|--video9999
Thanks for your reply. I appreciate your help very much. Could you show me your directory tree?
Hi @wangyu0303, where is your .mp4? Why the file is not video0.mp4, ..., video9999.mp4?
Thank you so much! I can train now. I owe you big time, :)
Hi, I wanna train CLIP4Clip on MSRVTT. But I got this issue. Can you Help me?
subprocess.CalledProcessError: Command '['/home/nccu/anaconda3/envs/Clip4Video-alvin/bin/python', '-u', 'main_task_retrieval.py', '--local_rank=3', '--do_train', '--num_thread_reader=0', '--epochs=5', '--batch_size=128', '--n_display=50', '--train_csv', './MSRVTT/videos/MSRVTT_train.9k.csv', '--val_csv', './MSRVTT/videos/MSRVTT_JSFUSION_test.csv', '--data_path', './MSRVTT/videos/MSRVTT_data.json', '--features_path', './MSRVTT/videos/MSRVTT_Videos', '--output_dir', 'ckpts/ckpt_msrvtt_retrieval_looseType', '--lr', '1e-4', '--max_words', '32', '--max_frames', '12', '--batch_size_val', '16', '--datatype', 'msrvtt', '--expand_msrvtt_sentences', '--feature_framerate', '1', '--coef_lr', '1e-3', '--freeze_layer_num', '0', '--slice_framepos', '2', '--loose_type', '--linear_patch', '2d', '--sim_header', 'meanP', '--pretrained_clip_name', 'ViT-B/32']' returned non-zero exit status 1.
I will appreciate your help with this situation. Thank you in advance.