模型在训练的时候很奇怪

Luo-Z13 / pointobb

[CVPR2024] PointOBB: Learning Oriented Object Detection via Single Point Supervision

MIT License

55 stars 3 forks source link

模型在训练的时候很奇怪 #1

Closed cxq1 closed 11 months ago

cxq1 commented 11 months ago

1701699267172 不知道为什么训练的时长会突然变长，很奇怪

Luo-Z13 commented 11 months ago

不知道为什么训练的时长会突然变长，很奇怪

Hello, this occurs because the angle branch is introduced after the "burn-in step 1" phase (as mentioned in the paper). You can adjust "burn_in_steps1" and "burn_in_steps2" in the config file.

cxq1 commented 11 months ago

thanks for your reply. it's a great work!! How long will it take you to train the model using two Gpus

cxq1 commented 11 months ago

Whether the model training will become inaccurate, when the training is interrupted, retraining re-enters the ""burn-in step 1" phase

Luo-Z13 commented 11 months ago

thanks for your reply. it's a great work!! How long will it take you to train the model using two Gpus

If you use the default configs, it takes about 15 hours on the DOTA-v1.0 and about 16 hours on DIOR-R.

Luo-Z13 commented 11 months ago

Whether the model training will become inaccurate, when the training is interrupted, retraining re-enters the ""burn-in step 1" phase

If interrupted, it is recommended to manually set the "iter_count" in the configs to the state at which the training was interrupted. Then, follow the resume procedure provided by MMDetection.

cxq1 commented 11 months ago

Whether the model training will become inaccurate, when the training is interrupted, retraining re-enters the ""burn-in step 1" phase

If interrupted, it is recommended to manually set the "iter_count" in the configs to the state at which the training was interrupted. Then, follow the resume procedure provided by MMDetection.

What version of mmcv are you using? Is pytorch the default 1.9? I was curious, why did I train on an A100 GPU for more than 5 days

Luo-Z13 commented 11 months ago

Whether the model training will become inaccurate, when the training is interrupted, retraining re-enters the ""burn-in step 1" phase

If interrupted, it is recommended to manually set the "iter_count" in the configs to the state at which the training was interrupted. Then, follow the resume procedure provided by MMDetection.

What version of mmcv are you using? Is pytorch the default 1.9? I was curious, why did I train on an A100 GPU for more than 5 days

My environment:

PyTorch: 1.9.0
CUDA Runtime: 11.1
CuDNN: 8.0.5
TorchVision: 0.10.0
OpenCV: 4.8.1
MMCV: 1.4.5
MMDetection: 2.13.0+c820f32