Closed JingweiZhang12 closed 1 year ago
Could you provide the training log with logger_iter_interval(500)? The loss of training is generally stable, and we can compare to see if there is any problem.
At the moment it looks like loss is high and recall is low. Although there will be some fluctuations in 12e, the gap between your results is too large, so there should be some problems.
This issue will continue to be open. If someone has the same problem, please feel free to ask questions here to see if it is a common problem.
Also, is it convenient to provide your environment configuration?
Hi, thanks for your source code. I set up the environment according to the guide and try this training command under this codebase:
bash scripts/dist_train.sh 8 --cfg_file ./cfgs/dsvt_models/dsvt_plain_D512e.yaml --sync_bn --logger_iter_interval 500
The evaluation results are below and could not reach the official reference precision:
OBJECT_TYPE_TYPE_VEHICLE_LEVEL_1/AP: 0.7226 OBJECT_TYPE_TYPE_VEHICLE_LEVEL_1/APH: 0.7177 OBJECT_TYPE_TYPE_VEHICLE_LEVEL_1/APL: 0.7226 OBJECT_TYPE_TYPE_VEHICLE_LEVEL_2/AP: 0.6382 OBJECT_TYPE_TYPE_VEHICLE_LEVEL_2/APH: 0.6337 OBJECT_TYPE_TYPE_VEHICLE_LEVEL_2/APL: 0.6382 OBJECT_TYPE_TYPE_PEDESTRIAN_LEVEL_1/AP: 0.7706 OBJECT_TYPE_TYPE_PEDESTRIAN_LEVEL_1/APH: 0.6910 OBJECT_TYPE_TYPE_PEDESTRIAN_LEVEL_1/APL: 0.7706 OBJECT_TYPE_TYPE_PEDESTRIAN_LEVEL_2/AP: 0.6895 OBJECT_TYPE_TYPE_PEDESTRIAN_LEVEL_2/APH: 0.6164 OBJECT_TYPE_TYPE_PEDESTRIAN_LEVEL_2/APL: 0.6895 OBJECT_TYPE_TYPE_SIGN_LEVEL_1/AP: 0.0000 OBJECT_TYPE_TYPE_SIGN_LEVEL_1/APH: 0.0000 OBJECT_TYPE_TYPE_SIGN_LEVEL_1/APL: 0.0000 OBJECT_TYPE_TYPE_SIGN_LEVEL_2/AP: 0.0000 OBJECT_TYPE_TYPE_SIGN_LEVEL_2/APH: 0.0000 OBJECT_TYPE_TYPE_SIGN_LEVEL_2/APL: 0.0000 OBJECT_TYPE_TYPE_CYCLIST_LEVEL_1/AP: 0.7039 OBJECT_TYPE_TYPE_CYCLIST_LEVEL_1/APH: 0.6915 OBJECT_TYPE_TYPE_CYCLIST_LEVEL_1/APL: 0.7039 OBJECT_TYPE_TYPE_CYCLIST_LEVEL_2/AP: 0.6777 OBJECT_TYPE_TYPE_CYCLIST_LEVEL_2/APH: 0.6658 OBJECT_TYPE_TYPE_CYCLIST_LEVEL_2/APL: 0.6777
My training log is here Could you give me some suggestions?
I find it. You forgot to turn on syncbn. Maybe you should stick to the script. When batch_size_per_gpu = 1, normal bn is useless, so syncbn must be turned on.
Thanks for your timely reply. My bad. I used --sync-bn
rather than --sync_bn
in the command.
Thanks for your timely reply. My bad. I used
--sync-bn
rather than--sync_bn
in the command.
Great! Wish you all the best. :)
Hi, thanks for your source code. I set up the environment according to the guide and try this training command under this codebase:
The evaluation results are below and could not reach the official reference precision:
My training log is here Could you give me some suggestions?