TengdaHan / CoCLR

[NeurIPS'20] Self-supervised Co-Training for Video Representation Learning. Tengda Han, Weidi Xie, Andrew Zisserman.
Apache License 2.0
286 stars 32 forks source link

about Initialization & Alternation #21

Closed junmin98 closed 3 years ago

junmin98 commented 3 years ago
  1. Initialization -> use of the pretrained InfoNCE checkpoint.pth.tar

CUDA_VISIBLE_DEVICES=2,3 python -m torch.distributed.launch --nproc_per_node=2 main_coclr.py --net s3d --topk 5 --moco-k 2048 --dataset ucf101-2stream-2clip --seq_len 32 --ds 1 --batch_size 32 --epochs 100 --schedule 80 --nameprefix Cycle1-FlowMining -j 4 --pretrain /mypath/CoCLR/pretrained_by_TH/InfoNCE-ucf101-rgb-128-s3d-ep399.pth.tar /mypath/CoCLR/pretrained_by_TH/InfoNCE-ucf101-f-128-s3d-ep396.pth.tar

If i type it like this, does it Initialization? and when i do that, these words are printed out: =======Check Weights Loading====== Weights not used from pretrained file:

Weights not loaded into new model: queue queue_ptr queue_second queue_vname queue_label

Why is the weights of the pretrained model not used??

  1. Alternation : In your paper, "where each cycle refers to a complete optimization of L1 and L2; meaning, the alternation only happens after the RGB or Flow network has converged."

So I entered what I wrote above into the terminal, and now I'm Training. (i.e Cycle 1 FlowMining) But acc@1 and acc@5 don't go over 1, is this the right value to have? Or is something wrong?

++ Additional If something's wrong, there's one thing I'm concerned about: in lmdb_dataset.py, I got a error for i.decode():

AttributeError: 'str' object has no attribute 'decode'

To fix this, I do that: self.db_keys_flow = msgpack.loads(txn.get(b'keys'), raw=True) self.db_order_flow = msgpack.loads(txn.get(b'order'), raw=True) . . self.db_order_rgb = msgpack.unpackb(txn.get(b'order'),raw=True) . . raw_rgb = msgpack.loads(txn.get(self.get_video_id_rgb[vname].encode('ascii')), raw=True) raw_flow = msgpack.loads(txn.get(self.get_video_id_flow[vname].encode('ascii')), raw=True)

I added "raw=True" and is this causing an error?

TengdaHan commented 3 years ago

Hi! Sorry for the late reply.

  1. yes the output you provided shows mode is initialized. You can check here: https://github.com/TengdaHan/CoCLR/blob/main/utils/utils.py#L88 The "weights not used from pretrained file" is actually None; the "weights not loaded into new model" are all related to momentum queue, I choose to re-accumulate these variables for a better-quality queue.
  2. Alternation stage. My acc is always between 0-1 (before percentage), do you mean acc is less than 0.01? that's strange. Acc of alternation stage should be similar/better than infoNCE stage.
  3. I didn't get this "AttributeError: 'str' object has no attribute 'decode'" with the same code.
  4. BTW, I just slightly updated the code since I am also running more experiments with the same version of code. You can have a look.
junmin98 commented 3 years ago

Hi! Sorry for the late reply.

1. yes the output you provided shows mode is initialized.
   You can check here: https://github.com/TengdaHan/CoCLR/blob/main/utils/utils.py#L88
   The "weights not used from pretrained file" is actually None; the "weights not loaded into new model" are all related to momentum queue, I choose to re-accumulate these variables for a better-quality queue.

2. Alternation stage. My acc is always between 0-1 (before percentage), do you mean acc is less than 0.01? that's strange. Acc of alternation stage should be similar/better than infoNCE stage.

3. I didn't get this "AttributeError: 'str' object has no attribute 'decode'" with the same code.

4. BTW, I just slightly updated the code since I am also running more experiments with the same version of code. You can have a look.

thank you for your answer! I understand most of your answers! (I understand the question of accuracy! have the same accuracy as you. between 0 and 1) But I have a more question.

  1. I ran it again with the code you posted a few hours ago, but i got an "AttributeError: 'str' object has no attribute 'decode'". Do you use python2?
KT27-A commented 3 years ago

Hi! Sorry for the late reply.

1. yes the output you provided shows mode is initialized.
   You can check here: https://github.com/TengdaHan/CoCLR/blob/main/utils/utils.py#L88
   The "weights not used from pretrained file" is actually None; the "weights not loaded into new model" are all related to momentum queue, I choose to re-accumulate these variables for a better-quality queue.

2. Alternation stage. My acc is always between 0-1 (before percentage), do you mean acc is less than 0.01? that's strange. Acc of alternation stage should be similar/better than infoNCE stage.

3. I didn't get this "AttributeError: 'str' object has no attribute 'decode'" with the same code.

4. BTW, I just slightly updated the code since I am also running more experiments with the same version of code. You can have a look.

thank you for your answer! I understand most of your answers! (I understand the question of accuracy! have the same accuracy as you. between 0 and 1) But I have a more question.

  1. I ran it again with the code you posted a few hours ago, but i got an "AttributeError: 'str' object has no attribute 'decode'". Do you use python2?

It seems OK to delete 'decode' directly. Now the model seems hard to converge, 20 epoch for 0.09 top-1 acc, have you met such a situation? Thanks.

junmin98 commented 3 years ago

Hi! Sorry for the late reply.

1. yes the output you provided shows mode is initialized.
   You can check here: https://github.com/TengdaHan/CoCLR/blob/main/utils/utils.py#L88
   The "weights not used from pretrained file" is actually None; the "weights not loaded into new model" are all related to momentum queue, I choose to re-accumulate these variables for a better-quality queue.

2. Alternation stage. My acc is always between 0-1 (before percentage), do you mean acc is less than 0.01? that's strange. Acc of alternation stage should be similar/better than infoNCE stage.

3. I didn't get this "AttributeError: 'str' object has no attribute 'decode'" with the same code.

4. BTW, I just slightly updated the code since I am also running more experiments with the same version of code. You can have a look.

thank you for your answer! I understand most of your answers! (I understand the question of accuracy! have the same accuracy as you. between 0 and 1) But I have a more question.

  1. I ran it again with the code you posted a few hours ago, but i got an "AttributeError: 'str' object has no attribute 'decode'". Do you use python2?

It seems OK to delete 'decode' directly. Now the model seems hard to converge, 20 epoch for 0.09 top-1 acc, have you met such a situation? Thanks.

thank you for your reply. In your case, do you mean that you get a 0.09 top-1 acc when you run it 20 epochs?

KT27-A commented 3 years ago

Hi! Sorry for the late reply.

1. yes the output you provided shows mode is initialized.
   You can check here: https://github.com/TengdaHan/CoCLR/blob/main/utils/utils.py#L88
   The "weights not used from pretrained file" is actually None; the "weights not loaded into new model" are all related to momentum queue, I choose to re-accumulate these variables for a better-quality queue.

2. Alternation stage. My acc is always between 0-1 (before percentage), do you mean acc is less than 0.01? that's strange. Acc of alternation stage should be similar/better than infoNCE stage.

3. I didn't get this "AttributeError: 'str' object has no attribute 'decode'" with the same code.

4. BTW, I just slightly updated the code since I am also running more experiments with the same version of code. You can have a look.

thank you for your answer! I understand most of your answers! (I understand the question of accuracy! have the same accuracy as you. between 0 and 1) But I have a more question.

  1. I ran it again with the code you posted a few hours ago, but i got an "AttributeError: 'str' object has no attribute 'decode'". Do you use python2?

It seems OK to delete 'decode' directly. Now the model seems hard to converge, 20 epoch for 0.09 top-1 acc, have you met such a situation? Thanks.

thank you for your reply. In your case, do you mean that you get a 0.09 top-1 acc when you run it 20 epochs?

Yeah, now it ran 200 epochs and got 0.69 top-1 acc when training. Did it make sense? I am a little confused that why it converged so slow with adam and lr 0.001.

TengdaHan commented 3 years ago

Hi! Sorry for the late reply.

1. yes the output you provided shows mode is initialized.
   You can check here: https://github.com/TengdaHan/CoCLR/blob/main/utils/utils.py#L88
   The "weights not used from pretrained file" is actually None; the "weights not loaded into new model" are all related to momentum queue, I choose to re-accumulate these variables for a better-quality queue.

2. Alternation stage. My acc is always between 0-1 (before percentage), do you mean acc is less than 0.01? that's strange. Acc of alternation stage should be similar/better than infoNCE stage.

3. I didn't get this "AttributeError: 'str' object has no attribute 'decode'" with the same code.

4. BTW, I just slightly updated the code since I am also running more experiments with the same version of code. You can have a look.

thank you for your answer! I understand most of your answers! (I understand the question of accuracy! have the same accuracy as you. between 0 and 1) But I have a more question.

  1. I ran it again with the code you posted a few hours ago, but i got an "AttributeError: 'str' object has no attribute 'decode'". Do you use python2?

I use python3

TengdaHan commented 3 years ago

Hi! Sorry for the late reply.

1. yes the output you provided shows mode is initialized.
   You can check here: https://github.com/TengdaHan/CoCLR/blob/main/utils/utils.py#L88
   The "weights not used from pretrained file" is actually None; the "weights not loaded into new model" are all related to momentum queue, I choose to re-accumulate these variables for a better-quality queue.

2. Alternation stage. My acc is always between 0-1 (before percentage), do you mean acc is less than 0.01? that's strange. Acc of alternation stage should be similar/better than infoNCE stage.

3. I didn't get this "AttributeError: 'str' object has no attribute 'decode'" with the same code.

4. BTW, I just slightly updated the code since I am also running more experiments with the same version of code. You can have a look.

thank you for your answer! I understand most of your answers! (I understand the question of accuracy! have the same accuracy as you. between 0 and 1) But I have a more question.

  1. I ran it again with the code you posted a few hours ago, but i got an "AttributeError: 'str' object has no attribute 'decode'". Do you use python2?

It seems OK to delete 'decode' directly. Now the model seems hard to converge, 20 epoch for 0.09 top-1 acc, have you met such a situation? Thanks.

thank you for your reply. In your case, do you mean that you get a 0.09 top-1 acc when you run it 20 epochs?

Yeah, now it ran 200 epochs and got 0.69 top-1 acc when training. Did it make sense? I am a little confused that why it converged so slow with adam and lr 0.001.

This is train/val curve of one of my experiments on fine-tuning InfoNCE-UCF101-RGB pre-trained models. Reduce lr by x0.1 on epoch 300. At 20 epochs I get 40+% accuracy. But it's true at 200 epochs I get ~70% accuracy. infonce-ft-ucf-128-example I think the reason for slow converge is I use 0.9 dropout (to prevent fast overfitting).

KT27-A commented 3 years ago

Hi! Sorry for the late reply.

1. yes the output you provided shows mode is initialized.
   You can check here: https://github.com/TengdaHan/CoCLR/blob/main/utils/utils.py#L88
   The "weights not used from pretrained file" is actually None; the "weights not loaded into new model" are all related to momentum queue, I choose to re-accumulate these variables for a better-quality queue.

2. Alternation stage. My acc is always between 0-1 (before percentage), do you mean acc is less than 0.01? that's strange. Acc of alternation stage should be similar/better than infoNCE stage.

3. I didn't get this "AttributeError: 'str' object has no attribute 'decode'" with the same code.

4. BTW, I just slightly updated the code since I am also running more experiments with the same version of code. You can have a look.

thank you for your answer! I understand most of your answers! (I understand the question of accuracy! have the same accuracy as you. between 0 and 1) But I have a more question.

  1. I ran it again with the code you posted a few hours ago, but i got an "AttributeError: 'str' object has no attribute 'decode'". Do you use python2?

It seems OK to delete 'decode' directly. Now the model seems hard to converge, 20 epoch for 0.09 top-1 acc, have you met such a situation? Thanks.

thank you for your reply. In your case, do you mean that you get a 0.09 top-1 acc when you run it 20 epochs?

Yeah, now it ran 200 epochs and got 0.69 top-1 acc when training. Did it make sense? I am a little confused that why it converged so slow with adam and lr 0.001.

This is train/val curve of one of my experiments on fine-tuning InfoNCE-UCF101-RGB pre-trained models. Reduce lr by x0.1 on epoch 300. At 20 epochs I get 40+% accuracy. But it's true at 200 epochs I get ~70% accuracy. infonce-ft-ucf-128-example I think the reason for slow converge is I use 0.9 dropout (to prevent fast overfitting).

Got it. Thank you!

KT27-A commented 3 years ago

Hi! Sorry for the late reply.

1. yes the output you provided shows mode is initialized.
   You can check here: https://github.com/TengdaHan/CoCLR/blob/main/utils/utils.py#L88
   The "weights not used from pretrained file" is actually None; the "weights not loaded into new model" are all related to momentum queue, I choose to re-accumulate these variables for a better-quality queue.

2. Alternation stage. My acc is always between 0-1 (before percentage), do you mean acc is less than 0.01? that's strange. Acc of alternation stage should be similar/better than infoNCE stage.

3. I didn't get this "AttributeError: 'str' object has no attribute 'decode'" with the same code.

4. BTW, I just slightly updated the code since I am also running more experiments with the same version of code. You can have a look.

thank you for your answer! I understand most of your answers! (I understand the question of accuracy! have the same accuracy as you. between 0 and 1) But I have a more question.

  1. I ran it again with the code you posted a few hours ago, but i got an "AttributeError: 'str' object has no attribute 'decode'". Do you use python2?

It seems OK to delete 'decode' directly. Now the model seems hard to converge, 20 epoch for 0.09 top-1 acc, have you met such a situation? Thanks.

thank you for your reply. In your case, do you mean that you get a 0.09 top-1 acc when you run it 20 epochs?

Yeah, now it ran 200 epochs and got 0.69 top-1 acc when training. Did it make sense? I am a little confused that why it converged so slow with adam and lr 0.001.

This is train/val curve of one of my experiments on fine-tuning InfoNCE-UCF101-RGB pre-trained models. Reduce lr by x0.1 on epoch 300. At 20 epochs I get 40+% accuracy. But it's true at 200 epochs I get ~70% accuracy. infonce-ft-ucf-128-example I think the reason for slow converge is I use 0.9 dropout (to prevent fast overfitting).

Hi, Tengda, I finished the pre-training and fixed several minor bugs in eval/main_classifier.py. Now I got 0.452 linear evaluation performance with 100 epochs, which is lower than 0.523, reported in the paper. Do I misunderstand some methods or configs? Besides, I found that you use non-consistent RandomResizeCrop when pre-training but consistent RandomResizeCrop in fine-tuning, could you please tell me what your hypothesis is for this setting? Thanks. Looking forward to your reply. I used commands for evaluation is CUDA_VISIBLE_DEVICES=0 python main_classifier.py --net s3d \ --dataset ucf101 --ds 1 --batch_size 32 -j 0 --center_crop \ --test log-eval-linclr/ucf101-128_sp1_lincls_s3d_Adam_bs32_lr0.001_dp0.9_wd0.001_seq1_len32_ds1_train-last_pt\=..-log-pretrain-infonce_k2048_ucf101-2clip-128_s3d_bs32_lr0.001_seq2_len32_ds1-model-model_best_epoch292.pth.tar/model/model_best_epoch95.pth.tar

TengdaHan commented 3 years ago
  1. In our paper the linear probe result 52.3% for InfoNCE-rgb was trained for 800 epochs, if InfoNCE-rgb was trained for 500 epochs I get 46.8% linear probe results. I have updated the NeurIPS final version and (soon) the Arxiv version to correct this. You can also check this helpful issue: https://github.com/TengdaHan/CoCLR/issues/3#issuecomment-734723016
  2. Also about RandomResizeCrop was discussed in the issue above. The "consistent" is still clip-wise, in pretraining stage, I concatenate two tensors together, apply "RandAug1" to the first half and "RandAug2" to the second half. In finetune stage, this consistent augment has no effect.
KT27-A commented 3 years ago

Got it. Thanks for your prompt and clear answers.

KT27-A commented 3 years ago
  1. In our paper the linear probe result 52.3% for InfoNCE-rgb was trained for 800 epochs, if InfoNCE-rgb was trained for 500 epochs I get 46.8% linear probe results. I have updated the NeurIPS final version and (soon) the Arxiv version to correct this. You can also check this helpful issue: #3 (comment)
  2. Also about RandomResizeCrop was discussed in the issue above. The "consistent" is still clip-wise, in pretraining stage, I concatenate two tensors together, apply "RandAug1" to the first half and "RandAug2" to the second half. In finetune stage, this consistent augment has no effect.

Did you mean that you train 300 epochs for pre-training stage and 800 epochs for fine-tuning stage?

KT27-A commented 3 years ago
  1. In our paper the linear probe result 52.3% for InfoNCE-rgb was trained for 800 epochs, if InfoNCE-rgb was trained for 500 epochs I get 46.8% linear probe results. I have updated the NeurIPS final version and (soon) the Arxiv version to correct this. You can also check this helpful issue: #3 (comment)
  2. Also about RandomResizeCrop was discussed in the issue above. The "consistent" is still clip-wise, in pretraining stage, I concatenate two tensors together, apply "RandAug1" to the first half and "RandAug2" to the second half. In finetune stage, this consistent augment has no effect.

Hi, Tengda, I found another thing confused me that the training set of UCF101-split1 has 9537 videos, but when I set bs=32, the total batches per epoch was 149 which is half of 9537//32=298. I haven't figured out why this happened.

TengdaHan commented 3 years ago
  1. the roadmap of our paper Table1 experiment is (pretrain epochs in parenthesis):

    InfoNCE-rgb(300) --------> CoCLR-Cyclex2(100x2) --------> our CoCLR-rgb, totally 500 epochs, 70.2% linear probe.
    InfoNCE-rgb(300) ---> continue to train InfoNCE(200) ---> InfoNCE-rgb baseline for a fair comparison, totally 500 epochs, 46.8% linear probe.

    The 800 epochs I mentioned above is also 'pretrain epochs'. InfoNCE-rgb(300+200) is a fair comparison and should be in Table1, but I unnecessarily put an InfoNCE-rgb(800) results, which I have corrected in NeurIPS final version. Hope this is clear now? BTW, thanks for the feedback.

  2. My batch_size is batchsize-per-GPU: https://github.com/TengdaHan/CoCLR/blob/c95eba977b68f713dca51c8c43c4fb64d69bcf59/main_coclr.py#L494 Are you using 2 GPUs? if yes, then 9537//(32*2)=149 is correct.

KT27-A commented 3 years ago
  1. the roadmap of our paper Table1 experiment is (pretrain epochs in parenthesis):
InfoNCE-rgb(300) --------> CoCLR-Cyclex2(100x2) --------> our CoCLR-rgb, totally 500 epochs, 70.2% linear probe.
InfoNCE-rgb(300) ---> continue to train InfoNCE(200) ---> InfoNCE-rgb baseline for a fair comparison, totally 500 epochs, 46.8% linear probe.

The 800 epochs I mentioned above is also 'pretrain epochs'. InfoNCE-rgb(300+200) is a fair comparison and should be in Table1, but I unnecessarily put an InfoNCE-rgb(800) results, which I have corrected in NeurIPS final version. Hope this is clear now? BTW, thanks for the feedback.

  1. My batch_size is batchsize-per-GPU: https://github.com/TengdaHan/CoCLR/blob/c95eba977b68f713dca51c8c43c4fb64d69bcf59/main_coclr.py#L494 Are you using 2 GPUs? if yes, then 9537//(32*2)=149 is correct.

Got it. Thanks. I found that there is no dropout in main_nce.py and main_coclr.py but found it in eval/main_classifier.py (when fine-tuning the entire network). Does this mean that you only use dropout when fine-tuning the entire network?

TengdaHan commented 3 years ago

I do not use dropout in the pre-training stage. The self-supervised pre-training is expected to overfit the huge dataset (but usually limited by the model capacity), it's not necessary to constraint network capacity by dropout some nodes. I use dropout in downstream classification tasks to avoid fast overfitting the UCF101 and HMDB51 training set (which are much smaller in size).

KT27-A commented 3 years ago

I do not use dropout in the pre-training stage. The self-supervised pre-training is expected to overfit the huge dataset (but usually limited by the model capacity), it's not necessary to constraint network capacity by dropout some nodes. I use dropout in downstream classification tasks to avoid fast overfitting the UCF101 and HMDB51 training set (which are much smaller in size).

Got it. Thanks.

KT27-A commented 3 years ago

Hi, Tengda, I found that A.RandomSizedCrop was used in validation and test, while people usually use isotropically resize + center crop for inference. Could you please tell me why you choose such a setting? I tested isotropically resize + center crop. There is not much performance difference. Thanks. https://github.com/TengdaHan/CoCLR/blob/110c83dcbd03b13b1c10a9c158d9f005899595af/eval/main_classifier.py#L738-L744

TengdaHan commented 3 years ago

Val is just to monitor performance, doesn't really matter. For test, I actually use "isotropically" 10-crop ((4corners + center) * 2flip): https://github.com/TengdaHan/CoCLR/blob/110c83dcbd03b13b1c10a9c158d9f005899595af/eval/main_classifier.py#L457 The line you pointed out was re-written during the final inference.

KT27-A commented 3 years ago

Val is just to monitor performance, doesn't really matter. For test, I actually use "isotropically" 10-crop ((4corners + center) * 2flip): https://github.com/TengdaHan/CoCLR/blob/110c83dcbd03b13b1c10a9c158d9f005899595af/eval/main_classifier.py#L457

The line you pointed out was re-written during the final inference.

Got it. Thanks.