running time for perpare.sh

dyabel / detpro

Apache License 2.0

171 stars 26 forks source link

running time for perpare.sh #9

Closed cailk closed 2 years ago

cailk commented 2 years ago

Hi, thanks for your great work!

I'm trying to run the first command in 'prepare.sh', _CUDA_VISIBLE_DEVICES=6,7 ./tools/dist_train.sh configs/lvis/detpro_ens_20e.py 2 --work-dir workdirs/collect_data --cfg-options model.roi_head.load_feature=False totolepochs=1 which is used to generate the CLIP embeddings for precomputed proposals. However, this process will take about 30 days with 8 16g-V100s. And in this issue #4, you already claim that it only takes one day. So I was wondering if I missed any details?

dyabel commented 2 years ago

Hi, thanks for your great work!

I'm trying to run the first command in 'prepare.sh', _CUDA_VISIBLE_DEVICES=6,7 ./tools/dist_train.sh configs/lvis/detpro_ens_20e.py 2 --work-dir workdirs/collect_data --cfg-options model.roi_head.load_feature=False totolepochs=1 which is used to generate the CLIP embeddings for precomputed proposals. However, this process will take about 30 days with 8 16g-V100s. And in this issue #4, you already claim that it only takes one day. So I was wondering if I missed any details?

Hi, do you modify the command the use 8 gpus? And make sure the total schedule is 1 epoch which should take about one day.

cailk commented 2 years ago

Yes, I have already changed the GPU numbers to 8, and the total_epochs is also reset to 1 in the command. And it looks like the running time is still 30+ days after printing some float tensors.

0.20857863751051303
0.19100091827364554
0.1633187772925764
0.12694300518134716
0.2342857142857143
0.23985572587917042
2022-04-20 09:35:50,993 - mmdet - INFO - Epoch [1][50/7665]     lr: 1.978e-03, eta: 36 days, 13:22:29, time: 20.610, data_time: 2.224, memory: 8942, loss_rpn_cls: 0.6659, loss_rpn_bbox: 0.1318, loss_bbox: 0.0274, text_cl
s_loss: 2.6543, kd_loss: 7.1762, loss_mask: 1.5504, loss: 12.2060

XiongweiWu commented 2 years ago

@cailk totol -> total

cailk commented 2 years ago

@cailk totol -> total

Alright, I'm an idiot. Thank you for the reminder~

cailk commented 2 years ago

@cailk totol -> total

BTW, I'm still wondering why this embeddings generation process produces training loss. Shouldn't only forward computation is required for this?

XiongweiWu commented 2 years ago

@cailk I guess it's an implementation issue. Personally speaking I prefer to generating the feature in test mode.