Haiyang-W / DSVT

[CVPR2023] Official Implementation of "DSVT: Dynamic Sparse Voxel Transformer with Rotated Sets"
https://arxiv.org/abs/2301.06051
Apache License 2.0
373 stars 28 forks source link

Could not reproduce the precision on 20% Waymo #29

Closed JingweiZhang12 closed 1 year ago

JingweiZhang12 commented 1 year ago

Hi, thanks for your source code. I set up the environment according to the guide and try this training command under this codebase:

bash scripts/dist_train.sh 8 --cfg_file ./cfgs/dsvt_models/dsvt_plain_D512e.yaml --sync_bn --logger_iter_interval 500

The evaluation results are below and could not reach the official reference precision:

OBJECT_TYPE_TYPE_VEHICLE_LEVEL_1/AP: 0.7226 
OBJECT_TYPE_TYPE_VEHICLE_LEVEL_1/APH: 0.7177 
OBJECT_TYPE_TYPE_VEHICLE_LEVEL_1/APL: 0.7226 
OBJECT_TYPE_TYPE_VEHICLE_LEVEL_2/AP: 0.6382 
OBJECT_TYPE_TYPE_VEHICLE_LEVEL_2/APH: 0.6337 
OBJECT_TYPE_TYPE_VEHICLE_LEVEL_2/APL: 0.6382 
OBJECT_TYPE_TYPE_PEDESTRIAN_LEVEL_1/AP: 0.7706 
OBJECT_TYPE_TYPE_PEDESTRIAN_LEVEL_1/APH: 0.6910 
OBJECT_TYPE_TYPE_PEDESTRIAN_LEVEL_1/APL: 0.7706 
OBJECT_TYPE_TYPE_PEDESTRIAN_LEVEL_2/AP: 0.6895 
OBJECT_TYPE_TYPE_PEDESTRIAN_LEVEL_2/APH: 0.6164 
OBJECT_TYPE_TYPE_PEDESTRIAN_LEVEL_2/APL: 0.6895 
OBJECT_TYPE_TYPE_SIGN_LEVEL_1/AP: 0.0000 
OBJECT_TYPE_TYPE_SIGN_LEVEL_1/APH: 0.0000 
OBJECT_TYPE_TYPE_SIGN_LEVEL_1/APL: 0.0000 
OBJECT_TYPE_TYPE_SIGN_LEVEL_2/AP: 0.0000 
OBJECT_TYPE_TYPE_SIGN_LEVEL_2/APH: 0.0000 
OBJECT_TYPE_TYPE_SIGN_LEVEL_2/APL: 0.0000 
OBJECT_TYPE_TYPE_CYCLIST_LEVEL_1/AP: 0.7039 
OBJECT_TYPE_TYPE_CYCLIST_LEVEL_1/APH: 0.6915 
OBJECT_TYPE_TYPE_CYCLIST_LEVEL_1/APL: 0.7039 
OBJECT_TYPE_TYPE_CYCLIST_LEVEL_2/AP: 0.6777 
OBJECT_TYPE_TYPE_CYCLIST_LEVEL_2/APH: 0.6658 
OBJECT_TYPE_TYPE_CYCLIST_LEVEL_2/APL: 0.6777 

My training log is here Could you give me some suggestions?

Haiyang-W commented 1 year ago

Could you provide the training log with logger_iter_interval(500)? The loss of training is generally stable, and we can compare to see if there is any problem.

At the moment it looks like loss is high and recall is low. Although there will be some fluctuations in 12e, the gap between your results is too large, so there should be some problems.

This issue will continue to be open. If someone has the same problem, please feel free to ask questions here to see if it is a common problem.

Haiyang-W commented 1 year ago

Also, is it convenient to provide your environment configuration?

Haiyang-W commented 1 year ago

Hi, thanks for your source code. I set up the environment according to the guide and try this training command under this codebase:

bash scripts/dist_train.sh 8 --cfg_file ./cfgs/dsvt_models/dsvt_plain_D512e.yaml --sync_bn --logger_iter_interval 500

The evaluation results are below and could not reach the official reference precision:

OBJECT_TYPE_TYPE_VEHICLE_LEVEL_1/AP: 0.7226 
OBJECT_TYPE_TYPE_VEHICLE_LEVEL_1/APH: 0.7177 
OBJECT_TYPE_TYPE_VEHICLE_LEVEL_1/APL: 0.7226 
OBJECT_TYPE_TYPE_VEHICLE_LEVEL_2/AP: 0.6382 
OBJECT_TYPE_TYPE_VEHICLE_LEVEL_2/APH: 0.6337 
OBJECT_TYPE_TYPE_VEHICLE_LEVEL_2/APL: 0.6382 
OBJECT_TYPE_TYPE_PEDESTRIAN_LEVEL_1/AP: 0.7706 
OBJECT_TYPE_TYPE_PEDESTRIAN_LEVEL_1/APH: 0.6910 
OBJECT_TYPE_TYPE_PEDESTRIAN_LEVEL_1/APL: 0.7706 
OBJECT_TYPE_TYPE_PEDESTRIAN_LEVEL_2/AP: 0.6895 
OBJECT_TYPE_TYPE_PEDESTRIAN_LEVEL_2/APH: 0.6164 
OBJECT_TYPE_TYPE_PEDESTRIAN_LEVEL_2/APL: 0.6895 
OBJECT_TYPE_TYPE_SIGN_LEVEL_1/AP: 0.0000 
OBJECT_TYPE_TYPE_SIGN_LEVEL_1/APH: 0.0000 
OBJECT_TYPE_TYPE_SIGN_LEVEL_1/APL: 0.0000 
OBJECT_TYPE_TYPE_SIGN_LEVEL_2/AP: 0.0000 
OBJECT_TYPE_TYPE_SIGN_LEVEL_2/APH: 0.0000 
OBJECT_TYPE_TYPE_SIGN_LEVEL_2/APL: 0.0000 
OBJECT_TYPE_TYPE_CYCLIST_LEVEL_1/AP: 0.7039 
OBJECT_TYPE_TYPE_CYCLIST_LEVEL_1/APH: 0.6915 
OBJECT_TYPE_TYPE_CYCLIST_LEVEL_1/APL: 0.7039 
OBJECT_TYPE_TYPE_CYCLIST_LEVEL_2/AP: 0.6777 
OBJECT_TYPE_TYPE_CYCLIST_LEVEL_2/APH: 0.6658 
OBJECT_TYPE_TYPE_CYCLIST_LEVEL_2/APL: 0.6777 

My training log is here Could you give me some suggestions?

I find it. You forgot to turn on syncbn. Maybe you should stick to the script. When batch_size_per_gpu = 1, normal bn is useless, so syncbn must be turned on.

JingweiZhang12 commented 1 year ago

Thanks for your timely reply. My bad. I used --sync-bn rather than --sync_bn in the command.

Haiyang-W commented 1 year ago

Thanks for your timely reply. My bad. I used --sync-bn rather than --sync_bn in the command.

Great! Wish you all the best. :)