TengdaHan / CoCLR

[NeurIPS'20] Self-supervised Co-Training for Video Representation Learning. Tengda Han, Weidi Xie, Andrew Zisserman.
Apache License 2.0
286 stars 32 forks source link

questions of training details of coclr #14

Closed YuqiHUO closed 3 years ago

YuqiHUO commented 3 years ago

Hi, im trying to replicate your result on the alternation stage, I now use two init models you provided (both 400~ epochs). I have two questions.

1). According to your paper, "At the alternation stage, on UCF101 the model is trained for two cycles, where each cycle includes 200 epochs, i.e. RGB and Flow networks are each trained for 100 epochs". Does that main I need to run main_coclr.py four times? each time with 100 epoch and the newest two pretrained models I have from the previous training process?

2). If so, what lr do you use in each of four 100 epochs in the alternation stage? I also checked the COCLR pretrained model you provided, it seems in 182 epoch and 109 epoch the lr is 1e-4. Is that mean I need to train the second cycles with larger lr, e.g. 1e-2, and decay down to 1e-4?

Best Regards, Yuqi

TengdaHan commented 3 years ago

Hi,

  1. Yes exactly. I updated some commands in the readme file.
  2. For each time I run main_coclr.py, I use Adam with lr starting from 1e-3, then decay once.
junmin98 commented 3 years ago

Yuqi

Hi, im trying to replicate your result on the alternation stage, I now use two init models you provided (both 400~ epochs). I have two questions.

1). According to your paper, "At the alternation stage, on UCF101 the model is trained for two cycles, where each cycle includes 200 epochs, i.e. RGB and Flow networks are each trained for 100 epochs". Does that main I need to run main_coclr.py four times? each time with 100 epoch and the newest two pretrained models I have from the previous training process?

2). If so, what lr do you use in each of four 100 epochs in the alternation stage? I also checked the COCLR pretrained model you provided, it seems in 182 epoch and 109 epoch the lr is 1e-4. Is that mean I need to train the second cycles with larger lr, e.g. 1e-2, and decay down to 1e-4?

Best Regards, Yuqi

Hi! I'm also on the Alternation stage, but I have a question. Do you see phrases like "Weighs not used from pretrained file:", "Weights not loaded into new model:"? If not, Can you share the phrase entered in the terminal such as "CUDA_VISIBLE_DEVICES..."

And how much accuracy does FlowMining of Cycle1 get? When I run it, the accuracy doesn't exceed 1. Originally in Cycle1, is it like that?

I would be grateful if you share your experiences!

YuqiHUO commented 3 years ago

Yuqi

Hi, im trying to replicate your result on the alternation stage, I now use two init models you provided (both 400~ epochs). I have two questions. 1). According to your paper, "At the alternation stage, on UCF101 the model is trained for two cycles, where each cycle includes 200 epochs, i.e. RGB and Flow networks are each trained for 100 epochs". Does that main I need to run main_coclr.py four times? each time with 100 epoch and the newest two pretrained models I have from the previous training process? 2). If so, what lr do you use in each of four 100 epochs in the alternation stage? I also checked the COCLR pretrained model you provided, it seems in 182 epoch and 109 epoch the lr is 1e-4. Is that mean I need to train the second cycles with larger lr, e.g. 1e-2, and decay down to 1e-4? Best Regards, Yuqi

Hi! I'm also on the Alternation stage, but I have a question. Do you see phrases like "Weighs not used from pretrained file:", "Weights not loaded into new model:"? If not, Can you share the phrase entered in the terminal such as "CUDA_VISIBLE_DEVICES..."

And how much accuracy does FlowMining of Cycle1 get? When I run it, the accuracy doesn't exceed 1. Originally in Cycle1, is it like that?

I would be grateful if you share your experiences!

Hi, 1) I see your another issue, during the alternation stage, encoders/samplers load their params from the pretrained files, while queues and ptrs are initialized randomly. Thats why you saw the "Weights not loaded into new model: queue.../ queue..." I change the code a lot, for reference, you can see my FlowMining phase

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 main_coclr.py \
--net s3d --topk 5 --moco-k 2048 --seq_len 32 --ds 1 \
--dataset ucf101-2stream-2clip \
--batch_size 16 -j 8  \
--epochs 100 --schedule 80 \
--prefix Cycle1-FlowMining_ \
--pretrain log-pretrain/ep399.pth.tar  log-pretrain/ep396.pth.tar

2) acc in coclr alternation stage means the top-1 acc on the pretext task "multi-positive instance discrimination". Thus you can never exceed 1, which means the 100% accuracy. Actually, you should get a number between 82%~92%, from my experience.

junmin98 commented 3 years ago

Thank you so much!

Yuqi

Hi, im trying to replicate your result on the alternation stage, I now use two init models you provided (both 400~ epochs). I have two questions. 1). According to your paper, "At the alternation stage, on UCF101 the model is trained for two cycles, where each cycle includes 200 epochs, i.e. RGB and Flow networks are each trained for 100 epochs". Does that main I need to run main_coclr.py four times? each time with 100 epoch and the newest two pretrained models I have from the previous training process? 2). If so, what lr do you use in each of four 100 epochs in the alternation stage? I also checked the COCLR pretrained model you provided, it seems in 182 epoch and 109 epoch the lr is 1e-4. Is that mean I need to train the second cycles with larger lr, e.g. 1e-2, and decay down to 1e-4? Best Regards, Yuqi

Hi! I'm also on the Alternation stage, but I have a question. Do you see phrases like "Weighs not used from pretrained file:", "Weights not loaded into new model:"? If not, Can you share the phrase entered in the terminal such as "CUDA_VISIBLE_DEVICES..." And how much accuracy does FlowMining of Cycle1 get? When I run it, the accuracy doesn't exceed 1. Originally in Cycle1, is it like that? I would be grateful if you share your experiences!

Hi,

1. I see your another issue, during the alternation stage, encoders/samplers load their params from the pretrained files, while  queues and ptrs are initialized randomly. Thats why you saw the "Weights not loaded into new model: queue.../ queue..."
   I change the code a lot, for reference, you can see my FlowMining phase
CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 main_coclr.py \
--net s3d --topk 5 --moco-k 2048 --seq_len 32 --ds 1 \
--dataset ucf101-2stream-2clip \
--batch_size 16 -j 8  \
--epochs 100 --schedule 80 \
--prefix Cycle1-FlowMining_ \
--pretrain log-pretrain/ep399.pth.tar  log-pretrain/ep396.pth.tar
1. acc in coclr alternation stage means the top-1 acc on the pretext task "multi-positive instance discrimination". Thus you can never exceed 1, which means the 100% accuracy. Actually, you should get a number between 82%~92%, from my experience.

Thank you so much! your answer helped me a lot! I'll keep going

junmin98 commented 3 years ago

Yuqi

Hi, im trying to replicate your result on the alternation stage, I now use two init models you provided (both 400~ epochs). I have two questions. 1). According to your paper, "At the alternation stage, on UCF101 the model is trained for two cycles, where each cycle includes 200 epochs, i.e. RGB and Flow networks are each trained for 100 epochs". Does that main I need to run main_coclr.py four times? each time with 100 epoch and the newest two pretrained models I have from the previous training process? 2). If so, what lr do you use in each of four 100 epochs in the alternation stage? I also checked the COCLR pretrained model you provided, it seems in 182 epoch and 109 epoch the lr is 1e-4. Is that mean I need to train the second cycles with larger lr, e.g. 1e-2, and decay down to 1e-4? Best Regards, Yuqi

Hi! I'm also on the Alternation stage, but I have a question. Do you see phrases like "Weighs not used from pretrained file:", "Weights not loaded into new model:"? If not, Can you share the phrase entered in the terminal such as "CUDA_VISIBLE_DEVICES..." And how much accuracy does FlowMining of Cycle1 get? When I run it, the accuracy doesn't exceed 1. Originally in Cycle1, is it like that? I would be grateful if you share your experiences!

Hi,

  1. I see your another issue, during the alternation stage, encoders/samplers load their params from the pretrained files, while queues and ptrs are initialized randomly. Thats why you saw the "Weights not loaded into new model: queue.../ queue..." I change the code a lot, for reference, you can see my FlowMining phase
CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 main_coclr.py \
--net s3d --topk 5 --moco-k 2048 --seq_len 32 --ds 1 \
--dataset ucf101-2stream-2clip \
--batch_size 16 -j 8  \
--epochs 100 --schedule 80 \
--prefix Cycle1-FlowMining_ \
--pretrain log-pretrain/ep399.pth.tar  log-pretrain/ep396.pth.tar
  1. acc in coclr alternation stage means the top-1 acc on the pretext task "multi-positive instance discrimination". Thus you can never exceed 1, which means the 100% accuracy. Actually, you should get a number between 82%~92%, from my experience.

Thanks to you, I am doing well. But can I just ask one more question?

After completing Cycle 1, I proceeded to Cycle2, which is a problem.

First of all, in the Cylcle1-flowMining process, the best of the result is epoch 4 or 5, Is your best result epoch about 4 or 5 in the Cycle1-FlowMining process?

and I used this to proceed with Cycle2-RGBMIning. So, after proceeding to Cycle 2, I proceeded downstream. The result of action recognition acc@1 = about 43% acc@5 = only about 73%. (very low..)

Second, when you proceed to Cycle2, do you have to proceed with start_epoch=101 and epoch=200 like this? When I proceeded with Cycle 2 as described above, the accuracy came out the same

YuqiHUO commented 3 years ago

Yuqi

Hi, im trying to replicate your result on the alternation stage, I now use two init models you provided (both 400~ epochs). I have two questions. 1). According to your paper, "At the alternation stage, on UCF101 the model is trained for two cycles, where each cycle includes 200 epochs, i.e. RGB and Flow networks are each trained for 100 epochs". Does that main I need to run main_coclr.py four times? each time with 100 epoch and the newest two pretrained models I have from the previous training process? 2). If so, what lr do you use in each of four 100 epochs in the alternation stage? I also checked the COCLR pretrained model you provided, it seems in 182 epoch and 109 epoch the lr is 1e-4. Is that mean I need to train the second cycles with larger lr, e.g. 1e-2, and decay down to 1e-4? Best Regards, Yuqi

Hi! I'm also on the Alternation stage, but I have a question. Do you see phrases like "Weighs not used from pretrained file:", "Weights not loaded into new model:"? If not, Can you share the phrase entered in the terminal such as "CUDA_VISIBLE_DEVICES..." And how much accuracy does FlowMining of Cycle1 get? When I run it, the accuracy doesn't exceed 1. Originally in Cycle1, is it like that? I would be grateful if you share your experiences!

Hi,

  1. I see your another issue, during the alternation stage, encoders/samplers load their params from the pretrained files, while queues and ptrs are initialized randomly. Thats why you saw the "Weights not loaded into new model: queue.../ queue..." I change the code a lot, for reference, you can see my FlowMining phase
CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 main_coclr.py \
--net s3d --topk 5 --moco-k 2048 --seq_len 32 --ds 1 \
--dataset ucf101-2stream-2clip \
--batch_size 16 -j 8  \
--epochs 100 --schedule 80 \
--prefix Cycle1-FlowMining_ \
--pretrain log-pretrain/ep399.pth.tar  log-pretrain/ep396.pth.tar
  1. acc in coclr alternation stage means the top-1 acc on the pretext task "multi-positive instance discrimination". Thus you can never exceed 1, which means the 100% accuracy. Actually, you should get a number between 82%~92%, from my experience.

Thanks to you, I am doing well. But can I just ask one more question?

After completing Cycle 1, I proceeded to Cycle2, which is a problem.

First of all, in the Cylcle1-flowMining process, the best of the result is epoch 4 or 5, Is your best result epoch about 4 or 5 in the Cycle1-FlowMining process?

and I used this to proceed with Cycle2-RGBMIning. So, after proceeding to Cycle 2, I proceeded downstream. The result of action recognition acc@1 = about 43% acc@5 = only about 73%. (very low..)

Second, when you proceed to Cycle2, do you have to proceed with start_epoch=101 and epoch=200 like this? When I proceeded with Cycle 2 as described above, the accuracy came out the same

1) The reason that the best of the result is 4 or 5 is the pretext task's acc is first down and then up. But the real 'best' epoch in the first cycle must be the Ep99 (I think). So you should use Ep99 to preceded the Cycle2-RGBMining. 2) I didn't use start_epoch=101 and epoch=200, I just set a total of 100 epochs (but you can regard this as 101-200), I think the epoch setting doesn't matter the final result.