Closed sqiangcao99 closed 3 years ago
Hi @sqiangcao99, I can not find the different settings from ours except for the GPUs number, and I am not sure whether it will affect the performance and the performance discrepancy is reasonable. How about your other sim_header
's results and can you print your logs of this running before the first epoch?
07/27/2021 09:21:49 - INFO - Effective parameters: 07/27/2021 09:21:49 - INFO - <<< batch_size: 128 07/27/2021 09:21:49 - INFO - <<< batch_size_val: 12 07/27/2021 09:21:49 - INFO - <<< cache_dir: 07/27/2021 09:21:49 - INFO - <<< coef_lr: 0.001 07/27/2021 09:21:49 - INFO - <<< cross_model: cross-base 07/27/2021 09:21:49 - INFO - <<< cross_num_hidden_layers: 4
07/27/2021 09:21:49 - INFO - <<< datatype: msrvtt 07/27/2021 09:21:49 - INFO - <<< do_eval: False 07/27/2021 09:21:49 - INFO - device: cuda:1 n_gpu: 2 07/27/2021 09:21:49 - INFO - <<< do_lower_case: False 07/27/2021 09:21:49 - INFO - <<< do_pretrain: False 07/27/2021 09:21:49 - INFO - <<< do_train: True 07/27/2021 09:21:49 - INFO - <<< epochs: 5 07/27/2021 09:21:49 - INFO - <<< eval_frame_order: 0 07/27/2021 09:21:49 - INFO - <<< expand_msrvtt_sentences: True 07/27/2021 09:21:49 - INFO - <<< feature_framerate: 1
07/27/2021 09:21:49 - INFO - <<< fp16: False 07/27/2021 09:21:49 - INFO - <<< fp16_opt_level: O1 07/27/2021 09:21:49 - INFO - <<< freeze_layer_num: 0 07/27/2021 09:21:49 - INFO - <<< gradient_accumulation_steps: 1 07/27/2021 09:21:49 - INFO - <<< hard_negative_rate: 0.5 07/27/2021 09:21:49 - INFO - <<< init_model: None 07/27/2021 09:21:49 - INFO - <<< linear_patch: 2d 07/27/2021 09:21:49 - INFO - <<< local_rank: 0 07/27/2021 09:21:49 - INFO - <<< loose_type: True 07/27/2021 09:21:49 - INFO - <<< lr: 0.0001 07/27/2021 09:21:49 - INFO - <<< lr_decay: 0.9 07/27/2021 09:21:49 - INFO - <<< margin: 0.1 07/27/2021 09:21:49 - INFO - <<< max_frames: 12 07/27/2021 09:21:49 - INFO - <<< max_words: 32 07/27/2021 09:21:49 - INFO - <<< n_display: 20 07/27/2021 09:21:49 - INFO - <<< n_gpu: 1 07/27/2021 09:21:49 - INFO - <<< n_pair: 1 07/27/2021 09:21:49 - INFO - <<< negative_weighting: 1 07/27/2021 09:21:49 - INFO - <<< num_thread_reader: 4
07/27/2021 09:21:49 - INFO - <<< rank: 0 07/27/2021 09:21:49 - INFO - <<< sampled_use_mil: False 07/27/2021 09:21:49 - INFO - <<< seed: 42 07/27/2021 09:21:49 - INFO - <<< sim_header: seqTransf 07/27/2021 09:21:49 - INFO - <<< slice_framepos: 2 7/27/2021 09:21:49 - INFO - <<< task_type: retrieval 07/27/2021 09:21:50 - INFO - <<< text_num_hidden_layers: 12 07/27/2021 09:21:50 - INFO - <<< train_csv: MSRVTT_train.9k.csv 07/27/2021 09:21:50 - INFO - <<< train_frame_order: 0 07/27/2021 09:21:50 - INFO - <<< use_mil: False 07/27/2021 09:21:50 - INFO - <<< val_csv: MSRVTT_JSFUSION_test.csv 07/27/2021 09:21:50 - INFO - <<< video_dim: 1024 07/27/2021 09:21:50 - INFO - <<< visual_num_hidden_layers: 12 07/27/2021 09:21:50 - INFO - <<< warmup_proportion: 0.1 07/27/2021 09:21:50 - INFO - <<< world_size: 2 07/27/2021 09:21:50 - INFO - device: cuda:0 n_gpu: 2 07/27/2021 09:21:51 - INFO - loading archive file clip_raw/modules/cross-base 07/27/2021 09:21:51 - INFO - Model config { "attention_probs_dropout_prob": 0.1, "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 512, "initializer_range": 0.02, "intermediate_size": 2048, "max_position_embeddings": 77, "num_attention_heads": 8, "num_hidden_layers": 4, "type_vocab_size": 2, 07/27/2021 09:21:51 - INFO - cross-base 07/27/2021 09:21:51 - INFO - Model config { "attention_probs_dropout_prob": 0.1, "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 512, "initializer_range": 0.02, "intermediate_size": 2048, "max_position_embeddings": 77, "num_attention_heads": 8, "num_hidden_layers": 4, "type_vocab_size": 2, "vocab_size": 512 } 07/27/2021 09:21:51 - INFO - Weight doesn't exsits. modules/cross-base/cross_pytorch_model.bin 07/27/2021 09:21:51 - WARNING - Stage-One:True, Stage-Two:False 07/27/2021 09:21:51 - WARNING - Test retrieval by loose type. 07/27/2021 09:21:51 - WARNING - embed_dim: 512 07/27/2021 09:21:51 - WARNING - image_resolution: 224 07/27/2021 09:21:51 - WARNING - vision_layers: 12 07/27/2021 09:21:51 - WARNING - vision_width: 768 07/27/2021 09:21:51 - WARNING - vision_patch_size: 32 07/27/2021 09:21:51 - WARNING - context_length: 77 07/27/2021 09:21:51 - WARNING - vocab_size: 49408 07/27/2021 09:21:51 - WARNING - transformer_width: 512 07/27/2021 09:21:51 - WARNING - transformer_heads: 8 07/27/2021 09:21:51 - WARNING - transformer_layers: 12 07/27/2021 09:21:51 - WARNING - linear_patch: 2d 07/27/2021 09:21:51 - WARNING - cut_top_layer: 0 07/27/2021 09:21:54 - WARNING - sim_header: seqTransf 07/27/2021 09:22:05 - INFO - -------------------- 07/27/2021 09:22:05 - INFO - Weights from pretrained model not used in CLIP4Clip: clip.input_resolution clip.context_length clip.vocab_size 07/27/2021 09:22:06 - INFO - Running test 07/27/2021 09:22:06 - INFO - Num examples = 1000 07/27/2021 09:22:06 - INFO - Batch size = 12 07/27/2021 09:22:06 - INFO - Num steps = 84 07/27/2021 09:22:06 - INFO - Running val 07/27/2021 09:22:06 - INFO - Num examples = 1000 07/27/2021 09:22:25 - INFO - Running training 07/27/2021 09:22:25 - INFO - Num examples = 180000 07/27/2021 09:22:25 - INFO - Batch size = 128 07/27/2021 09:22:25 - INFO - Num steps = 7030 07/27/2021 09:23:12 - INFO - Epoch: 1/5, Step: 20/1406, Lr: 0.000000003-0.000002845, Loss: 1.888217, Time/step: 2.334767 07/27/2021 09:23:47 - INFO - Epoch: 1/5, Step: 40/1406, Lr: 0.000000006-0.000005690, Loss: 1.840597, Time/step: 1.778786 07/27/2021 09:24:23 - INFO - Epoch: 1/5, Step: 60/1406, Lr: 0.000000009-0.000008535, Loss: 1.790529, Time/step: 1.789728 ······ 07/27/2021 10:05:17 - INFO - Epoch 1/5 Finished, Train Loss: 1.002715 07/27/2021 10:05:22 - INFO - Model saved to pytorch_model.bin.0 07/27/2021 10:07:57 - INFO - sim matrix size: 1000, 1000 07/27/2021 10:07:57 - INFO - Length-T: 1000, Length-V:1000 07/27/2021 10:07:57 - INFO - Text-to-Video: 07/27/2021 10:07:57 - INFO - >>> R@1: 41.7 - R@5: 69.9 - R@10: 80.4 - Median R: 2.0 - Mean R: 15.3 07/27/2021 10:07:57 - INFO - Video-to-Text: 07/27/2021 10:07:57 - INFO - >>> V2T$R@1: 41.4 - V2T$R@5: 68.5 - V2T$R@10: 79.8 - V2T$Median R: 2.0 - V2T$Mean R: 13.0 ·······
Hi @sqiangcao99, I do not find the essential difference. Maybe you can test on the same GPU number (not sure). Below is our log before the first epoch for your information (log format Video-to-Text:
is added before releasing the code). If you have any new progress on this problem, welcome to share with me. Thanks.
......
2021-04-12 22:01:18,769:INFO: <<< n_display: 50
......
2021-04-12 22:01:18,772:INFO: <<< world_size: 4
......
2021-04-12 22:01:53,082:INFO: ***** Running test *****
2021-04-12 22:01:53,082:INFO: Num examples = 1000
2021-04-12 22:01:53,082:INFO: Batch size = 32
2021-04-12 22:01:53,082:INFO: Num steps = 32
2021-04-12 22:02:08,860:INFO: ***** Running training *****
2021-04-12 22:02:08,861:INFO: Num examples = 180000
2021-04-12 22:02:08,861:INFO: Batch size = 128
2021-04-12 22:02:08,861:INFO: Num steps = 7030
2021-04-12 22:05:06,570:INFO: Epoch: 1/5, Step: 50/1406, Lr: 0.000000007-0.000007112, Loss: 1.656809, Time/step: 3.554149
2021-04-12 22:07:54,349:INFO: Epoch: 1/5, Step: 100/1406, Lr: 0.000000014-0.000014225, Loss: 1.740155, Time/step: 3.355569
2021-04-12 22:10:39,252:INFO: Epoch: 1/5, Step: 150/1406, Lr: 0.000000021-0.000021337, Loss: 1.050747, Time/step: 3.298057
2021-04-12 22:13:23,374:INFO: Epoch: 1/5, Step: 200/1406, Lr: 0.000000028-0.000028450, Loss: 1.297645, Time/step: 3.282416
2021-04-12 22:16:01,482:INFO: Epoch: 1/5, Step: 250/1406, Lr: 0.000000036-0.000035562, Loss: 1.209267, Time/step: 3.162147
2021-04-12 22:18:38,089:INFO: Epoch: 1/5, Step: 300/1406, Lr: 0.000000043-0.000042674, Loss: 1.197618, Time/step: 3.132139
......
2021-04-12 23:12:04,353:INFO: Epoch: 1/5, Step: 1300/1406, Lr: 0.000000092-0.000091797, Loss: 0.667563, Time/step: 3.341208
2021-04-12 23:14:54,517:INFO: Epoch: 1/5, Step: 1350/1406, Lr: 0.000000091-0.000091174, Loss: 0.668952, Time/step: 3.403264
2021-04-12 23:17:39,297:INFO: Epoch: 1/5, Step: 1400/1406, Lr: 0.000000091-0.000090530, Loss: 0.393518, Time/step: 3.295593
2021-04-12 23:18:00,145:INFO: Epoch 1/5 Finished, Train Loss: 1.004058
2021-04-12 23:20:18,768:INFO: sim matrix size: 1000, 1000
2021-04-12 23:20:18,870:INFO: Length-T: 1000, Length-V:1000
2021-04-12 23:20:18,870:INFO: Text-to-Video:
2021-04-12 23:20:18,871:INFO: >>> R@1: 42.3 - R@5: 70.5 - R@10: 79.8 - Median R: 2.0 - Mean R: 16.2
2021-04-12 23:20:18,871:INFO: >>> V2T$R@1: 42.3 - V2T$R@5: 70.1 - V2T$R@10: 80.1 - V2T$Median R: 2.0 - V2T$Mean R: 12.7
Thank you for your continued attention to this issue. I have tried with 4 GPU. The result is still not the same. Is it because of the CUDA version or the Datasets?
Driver Version: 450.51.06 CUDA Version: 11.0
···
07/28/2021 16:15:25 - INFO - <<< rank: 0
07/28/2021 16:15:25 - INFO - <<< sampled_use_mil: False
07/28/2021 16:15:25 - INFO - <<< seed: 42
07/28/2021 16:15:25 - INFO - <<< sim_header: seqTransf
07/28/2021 16:15:25 - INFO - <<< slice_framepos: 2
07/28/2021 16:15:25 - INFO - <<< task_type: retrieval
07/28/2021 16:15:25 - INFO - <<< text_num_hidden_layers: 12
07/28/2021 16:15:25 - INFO - <<< train_csv: csv/msrvtt_data/MSRVTT_train.9k.csv
07/28/2021 16:15:25 - INFO - <<< train_frame_order: 0
07/28/2021 16:15:25 - INFO - <<< use_mil: False
07/28/2021 16:15:25 - INFO - <<< val_csv: csv/msrvtt_data/MSRVTT_JSFUSION_test.csv
07/28/2021 16:15:25 - INFO - <<< video_dim: 1024
07/28/2021 16:15:25 - INFO - <<< visual_num_hidden_layers: 12
07/28/2021 16:15:25 - INFO - <<< warmup_proportion: 0.1
07/28/2021 16:15:25 - INFO - <<< world_size: 4
···
07/28/2021 16:16:03 - INFO - Num steps = 7030
07/28/2021 16:17:17 - INFO - Epoch: 1/5, Step: 50/1406, Lr: 0.000000007-0.000007112, Loss: 1.702823, Time/step: 1.466081
07/28/2021 16:18:25 - INFO - Epoch: 1/5, Step: 100/1406, Lr: 0.000000014-0.000014225, Loss: 1.731421, Time/step: 1.370271
07/28/2021 16:19:33 - INFO - Epoch: 1/5, Step: 150/1406, Lr: 0.000000021-0.000021337, Loss: 1.066895, Time/step: 1.357439
07/28/2021 16:20:42 - INFO - Epoch: 1/5, Step: 200/1406, Lr: 0.000000028-0.000028450, Loss: 1.292294, Time/step: 1.369586
07/28/2021 16:21:50 - INFO - Epoch: 1/5, Step: 250/1406, Lr: 0.000000036-0.000035562, Loss: 1.193302, Time/step: 1.368164
····
07/28/2021 16:39:22 - INFO - Epoch: 1/5, Step: 950/1406, Lr: 0.000000096-0.000095561, Loss: 0.918803, Time/step: 1.856508
07/28/2021 16:40:57 - INFO - Epoch: 1/5, Step: 1000/1406, Lr: 0.000000095-0.000095090, Loss: 1.007762, Time/step: 1.902733
07/28/2021 16:42:31 - INFO - Epoch: 1/5, Step: 1050/1406, Lr: 0.000000095-0.000094596, Loss: 0.818108, Time/step: 1.882588
07/28/2021 16:44:06 - INFO - Epoch: 1/5, Step: 1100/1406, Lr: 0.000000094-0.000094080, Loss: 0.636207, Time/step: 1.898808
07/28/2021 16:45:39 - INFO - Epoch: 1/5, Step: 1150/1406, Lr: 0.000000094-0.000093541, Loss: 0.688115, Time/step: 1.855371
07/28/2021 16:47:12 - INFO - Epoch: 1/5, Step: 1200/1406, Lr: 0.000000093-0.000092981, Loss: 0.807981, Time/step: 1.857128
07/28/2021 16:48:36 - INFO - Epoch: 1/5, Step: 1250/1406, Lr: 0.000000092-0.000092400, Loss: 0.832460, Time/step: 1.679541
07/28/2021 16:49:44 - INFO - Epoch: 1/5, Step: 1300/1406, Lr: 0.000000092-0.000091797, Loss: 0.682100, Time/step: 1.369774
07/28/2021 16:50:53 - INFO - Epoch: 1/5, Step: 1350/1406, Lr: 0.000000091-0.000091174, Loss: 0.665236, Time/step: 1.374193
07/28/2021 16:52:01 - INFO - Epoch: 1/5, Step: 1400/1406, Lr: 0.000000091-0.000090530, Loss: 0.414655, Time/step: 1.362176
07/28/2021 16:52:09 - INFO - Epoch 1/5 Finished, Train Loss: 1.002616
···
07/28/2021 16:54:20 - INFO - sim matrix size: 1000, 1000
07/28/2021 16:54:20 - INFO - Length-T: 1000, Length-V:1000
07/28/2021 16:54:20 - INFO - Text-to-Video:
07/28/2021 16:54:20 - INFO - >>> R@1: 41.1 - R@5: 69.8 - R@10: 80.3 - Median R: 2.0 - Mean R: 15.3
07/28/2021 16:54:20 - INFO - Video-to-Text:
07/28/2021 16:54:20 - INFO - >>> V2T$R@1: 41.7 - V2T$R@5: 68.5 - V2T$R@10: 80.2 - V2T$Median R: 2.0 - V2T$Mean R: 13.1
By the way, I have tried to speed up the training process by saving the values returned by the dataset class
first.
video, video_mask = self._get_rawvideo(choice_video_ids)
# the code for saving the files
if video_id not in self.saved_video:
self.saved_video.append(video_id)
video_info = {}
video_info['video'] = video
video_info['video_mask'] = video_mask
save_path = os.path.join(self.save_path,video_id)
np.save(save_path, video_info)
Hi @sqiangcao99, we have almost the same CUDA driver, Driver Version: 450.80.02 CUDA Version: 11.0
.
The datasets are the same, too. Curiously, I think we have the same settings now but the performance has a little gap. Your Train Loss
is lower than ours, too. How about your other sim_header
? Are they acceptable?
You can also use LMDB to speed up, too. It is memory-friendly. Also thanks for your suggestion.
I tried the meanP
. When I set the epoch num is 5, the results are also worse. But when I set epoch num to 3, the results get better.
When epoch num is 3:
2021-06-13 01:58:59,920:INFO: Text-to-Video:
2021-06-13 01:58:59,921:INFO: >>> R@1: 43.0 - R@5: 70.3 - R@10: 80.4 - Median R: 2.0 - Mean R: 15.8
2021-06-13 01:58:59,921:INFO: Video-to-Text:
2021-06-13 01:58:59,921:INFO: >>> V2T$R@1: 42.6 - V2T$R@5: 70.8 - V2T$R@10: 81.4 - V2T$Median R: 2.0 - V2T$Mean R: 11.9
When epoch num is 5:
2021-06-04 22:36:14,397:INFO: >>> R@1: 42.2 - R@5: 71.2 - R@10: 80.6 - Median R: 2.0 - Mean R: 15.8
2021-06-04 22:36:14,397:INFO: Video-to-Text:
2021-06-04 22:36:14,398:INFO: >>> V2T$R@1: 41.8 - V2T$R@5: 70.6 - V2T$R@10: 81.1 - V2T$Median R: 2.0 - V2T$Mean R: 11.8
2021-06-04 22:36:14,399:INFO: The best model is: None, the R1 is: 42.2000
Oh, it is not totally the same as ours. I do not know whether the gap is normal for this reproduction now. It is strange if you did not change any code on ours, and I have no more idea about this problem now.
If you want to compare your results with ours in your research, an idea I think is that you can report your implementation because they are got in the same environment and dataset. Thanks for your sharing and discussion.
Thank you so much for helping me. I have learned a lot.
Thank you so much for helping me. I have learned a lot.
It's strange that I can't reproduce the result, too. Maybe we can get a connection and discuss that where is the problem. My QQ number is 1471659527.
When I use the following configuration to train the model on
MSRVTT Training-9K
, the best result I got is07/27/2021 13:11:01 - INFO - sim matrix size: 1000, 1000 07/27/2021 13:11:01 - INFO - Length-T: 1000, Length-V:1000 07/27/2021 13:11:01 - INFO - Text-to-Video: 07/27/2021 13:11:01 - INFO - >>> R@1: 43.2 - R@5: 71.0 - R@10: 79.4 - Median R: 2.0 - Mean R: 15.4 07/27/2021 13:11:01 - INFO - Video-to-Text: 07/27/2021 13:11:01 - INFO - >>> V2T$R@1: 43.1 - V2T$R@5: 71.2 - V2T$R@10: 80.7 - V2T$Median R: 2.0 - V2T$Mean R: 11.9
. It's worse than the resultsR@1: 44.5
listed in the paper. Did i miss some details? Here is the configuration.CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 --master_addr=127.0.0.2 --master_port 29552 main_ta sk_retrieval.py --num_thread_reader=4 --epochs=5 --batch_size=128 --n_display=20 --train_csv /home/hadoop-vacv/cephfs/data/caoshuqia ng/data/jobs/MSRVTT/csv/msrvtt_data/MSRVTT_train.9k.csv --val_csv /home/hadoop-vacv/cephfs/data/caoshuqiang/data/jobs/MSRVTT/csv/msr vtt_data/MSRVTT_JSFUSION_test.csv --data_path /home/hadoop-vacv/cephfs/data/caoshuqiang/data/jobs/MSRVTT/csv/msrvtt_data/MSRVTT_data .json --features_path /home/hadoop-vacv/cephfs/data/caoshuqiang/data/jobs/MSRVTT/MSRVTT_Videos --output_dir /home/hadoop-vacv/cephfs /data/caoshuqiang/code/vicab/newexp/hope/clip_raw --lr 1e-4 --max_words 32 --max_frames 12 --batch_size_val 12 --datatype msrvtt -- expand_msrvtt_sentences --feature_framerate 1 --coef_lr 1e-3 --freeze_layer_num 0 --slice_framepos 2 --loose_type --linear_patch 2d --sim_header seqTransf --do_train
.