aharley / simple_bev

A Simple Baseline for BEV Perception
MIT License
502 stars 79 forks source link

Questions about the results of bevformer in table1. #25

Open LHY-HongyangLi opened 1 year ago

LHY-HongyangLi commented 1 year ago

Hi, Dr.Harley, simple-bev is really a nice work, however, when running the code to reproduce the results in “table1” I meet some problems:

  1. Is "simple_bev/nets/bevformernet.py” corresponds to the "Deformable attention” in “table1” and "simple_bev/nets/bevformernet2.py” corresponds to the "Multi-scale deform. attn.” in “table1”?
  2. The performance of “bevformernet.py” seems to be similar with “segnet”(as shown bellow). I don’t know if I have did anything wrong.
  3. When training the "bevformernet2" it tends to be overfitted, IOU_v is low. Thank you for your attention, I would be really grateful if you can help me out.

log of Q2:

segnet:
8x5_3e-4s_segnet_reproduce_19:22:27_16_eval_20:55:09; step 000356/376; rtime 0.06; itime 0.69 (783.48 ms); loss 2.87869; iou_ev 47.4
8x5_3e-4s_segnet_reproduce_19:22:27_16_eval_20:55:09; step 000357/376; rtime 0.07; itime 0.80 (783.53 ms); loss 0.26470; iou_ev 47.4
8x5_3e-4s_segnet_reproduce_19:22:27_16_eval_20:55:09; step 000358/376; rtime 0.07; itime 0.69 (783.27 ms); loss -0.71782; iou_ev 47.4
8x5_3e-4s_segnet_reproduce_19:22:27_16_eval_20:55:09; step 000359/376; rtime 0.06; itime 0.66 (782.94 ms); loss 0.30850; iou_ev 47.4
8x5_3e-4s_segnet_reproduce_19:22:27_16_eval_20:55:09; step 000360/376; rtime 0.08; itime 0.66 (782.60 ms); loss -0.15269; iou_ev 47.5
8x5_3e-4s_segnet_reproduce_19:22:27_16_eval_20:55:09; step 000361/376; rtime 0.04; itime 0.63 (782.17 ms); loss -0.56200; iou_ev 47.5
8x5_3e-4s_segnet_reproduce_19:22:27_16_eval_20:55:09; step 000362/376; rtime 0.04; itime 0.62 (781.73 ms); loss 0.11772; iou_ev 47.5
8x5_3e-4s_segnet_reproduce_19:22:27_16_eval_20:55:09; step 000363/376; rtime 0.04; itime 0.63 (781.31 ms); loss 0.36632; iou_ev 47.5
8x5_3e-4s_segnet_reproduce_19:22:27_16_eval_20:55:09; step 000364/376; rtime 0.04; itime 0.62 (780.87 ms); loss 1.63898; iou_ev 47.6
8x5_3e-4s_segnet_reproduce_19:22:27_16_eval_20:55:09; step 000365/376; rtime 0.05; itime 0.64 (780.47 ms); loss 5.04701; iou_ev 47.5
8x5_3e-4s_segnet_reproduce_19:22:27_16_eval_20:55:09; step 000366/376; rtime 0.04; itime 0.64 (780.08 ms); loss 1.01278; iou_ev 47.5
8x5_3e-4s_segnet_reproduce_19:22:27_16_eval_20:55:09; step 000367/376; rtime 0.05; itime 0.65 (779.72 ms); loss 0.95337; iou_ev 47.5
8x5_3e-4s_segnet_reproduce_19:22:27_16_eval_20:55:09; step 000368/376; rtime 0.05; itime 0.66 (779.39 ms); loss 0.78735; iou_ev 47.5
8x5_3e-4s_segnet_reproduce_19:22:27_16_eval_20:55:09; step 000369/376; rtime 0.05; itime 0.65 (779.04 ms); loss 2.96054; iou_ev 47.5
8x5_3e-4s_segnet_reproduce_19:22:27_16_eval_20:55:09; step 000370/376; rtime 0.04; itime 0.62 (778.62 ms); loss 2.15018; iou_ev 47.5
8x5_3e-4s_segnet_reproduce_19:22:27_16_eval_20:55:09; step 000371/376; rtime 0.04; itime 0.72 (778.45 ms); loss 4.13992; iou_ev 47.4
8x5_3e-4s_segnet_reproduce_19:22:27_16_eval_20:55:09; step 000372/376; rtime 0.05; itime 0.64 (778.07 ms); loss 1.97022; iou_ev 47.4
8x5_3e-4s_segnet_reproduce_19:22:27_16_eval_20:55:09; step 000373/376; rtime 0.05; itime 0.63 (777.67 ms); loss 0.56684; iou_ev 47.4
8x5_3e-4s_segnet_reproduce_19:22:27_16_eval_20:55:09; step 000374/376; rtime 0.04; itime 0.63 (777.27 ms); loss 1.27457; iou_ev 47.4
8x5_3e-4s_segnet_reproduce_19:22:27_16_eval_20:55:09; step 000375/376; rtime 0.05; itime 0.64 (776.91 ms); loss 3.13456; iou_ev 47.4
8x5_3e-4s_segnet_reproduce_19:22:27_16_eval_20:55:09; step 000376/376; rtime 0.05; itime 0.64 (776.55 ms); loss 1.49759; iou_ev 47.4
final trainval mean iou 47.43055910993624

bevformer
8x5_3e-4s_bevformer_21:07:58_16_eval_01:13:41; step 000351/376; rtime 0.04; itime 1.57 (1583.65 ms); loss 1.46815; iou_ev 47.4
8x5_3e-4s_bevformer_21:07:58_16_eval_01:13:41; step 000352/376; rtime 0.05; itime 1.54 (1583.52 ms); loss 1.36396; iou_ev 47.4
8x5_3e-4s_bevformer_21:07:58_16_eval_01:13:41; step 000353/376; rtime 0.04; itime 1.52 (1583.35 ms); loss 1.52722; iou_ev 47.4
8x5_3e-4s_bevformer_21:07:58_16_eval_01:13:41; step 000354/376; rtime 0.04; itime 1.52 (1583.18 ms); loss 1.45763; iou_ev 47.4
8x5_3e-4s_bevformer_21:07:58_16_eval_01:13:41; step 000355/376; rtime 0.05; itime 1.55 (1583.09 ms); loss 2.03853; iou_ev 47.4
8x5_3e-4s_bevformer_21:07:58_16_eval_01:13:41; step 000356/376; rtime 0.07; itime 1.54 (1582.97 ms); loss 1.00910; iou_ev 47.4
8x5_3e-4s_bevformer_21:07:58_16_eval_01:13:41; step 000357/376; rtime 0.04; itime 1.51 (1582.76 ms); loss 0.24750; iou_ev 47.4
8x5_3e-4s_bevformer_21:07:58_16_eval_01:13:41; step 000358/376; rtime 0.04; itime 1.52 (1582.60 ms); loss 0.00532; iou_ev 47.4
8x5_3e-4s_bevformer_21:07:58_16_eval_01:13:41; step 000359/376; rtime 0.04; itime 1.52 (1582.43 ms); loss 0.19061; iou_ev 47.4
8x5_3e-4s_bevformer_21:07:58_16_eval_01:13:41; step 000360/376; rtime 0.04; itime 1.52 (1582.24 ms); loss 0.04461; iou_ev 47.4
8x5_3e-4s_bevformer_21:07:58_16_eval_01:13:41; step 000361/376; rtime 0.04; itime 1.51 (1582.06 ms); loss -0.15120; iou_ev 47.4
8x5_3e-4s_bevformer_21:07:58_16_eval_01:13:41; step 000362/376; rtime 0.04; itime 1.51 (1581.86 ms); loss 0.19511; iou_ev 47.5
8x5_3e-4s_bevformer_21:07:58_16_eval_01:13:41; step 000363/376; rtime 0.04; itime 1.51 (1581.67 ms); loss 0.41124; iou_ev 47.5
8x5_3e-4s_bevformer_21:07:58_16_eval_01:13:41; step 000364/376; rtime 0.05; itime 1.54 (1581.56 ms); loss 0.64006; iou_ev 47.5
8x5_3e-4s_bevformer_21:07:58_16_eval_01:13:41; step 000365/376; rtime 0.05; itime 1.53 (1581.42 ms); loss 1.81830; iou_ev 47.4
8x5_3e-4s_bevformer_21:07:58_16_eval_01:13:41; step 000366/376; rtime 0.04; itime 1.53 (1581.29 ms); loss 0.45757; iou_ev 47.4
8x5_3e-4s_bevformer_21:07:58_16_eval_01:13:41; step 000367/376; rtime 0.04; itime 5.94 (1593.17 ms); loss 0.35694; iou_ev 47.4
8x5_3e-4s_bevformer_21:07:58_16_eval_01:13:41; step 000368/376; rtime 0.07; itime 1.55 (1593.07 ms); loss 0.77848; iou_ev 47.4
8x5_3e-4s_bevformer_21:07:58_16_eval_01:13:41; step 000369/376; rtime 0.05; itime 1.52 (1592.87 ms); loss 1.30311; iou_ev 47.4
8x5_3e-4s_bevformer_21:07:58_16_eval_01:13:41; step 000370/376; rtime 0.04; itime 1.50 (1592.63 ms); loss 1.00361; iou_ev 47.4
8x5_3e-4s_bevformer_21:07:58_16_eval_01:13:41; step 000371/376; rtime 0.04; itime 1.51 (1592.39 ms); loss 2.16301; iou_ev 47.3
8x5_3e-4s_bevformer_21:07:58_16_eval_01:13:41; step 000372/376; rtime 0.04; itime 1.52 (1592.19 ms); loss 0.88017; iou_ev 47.3
8x5_3e-4s_bevformer_21:07:58_16_eval_01:13:41; step 000373/376; rtime 0.04; itime 1.52 (1592.00 ms); loss 0.48593; iou_ev 47.4
8x5_3e-4s_bevformer_21:07:58_16_eval_01:13:41; step 000374/376; rtime 0.04; itime 1.52 (1591.81 ms); loss 0.77463; iou_ev 47.4
8x5_3e-4s_bevformer_21:07:58_16_eval_01:13:41; step 000375/376; rtime 0.04; itime 1.51 (1591.59 ms); loss 1.96368; iou_ev 47.3
8x5_3e-4s_bevformer_21:07:58_16_eval_01:13:41; step 000376/376; rtime 0.04; itime 1.51 (1591.38 ms); loss 0.78340; iou_ev 47.4
final trainval mean iou 47.36635237197667

log of Q3:

bevformer
8x5_3e-4s_bevformer_21:07:58; step 005610/25000; rtime 0.23; itime 7.22; loss 1.94302; iou_t 29.3; iou_v 29.8
8x5_3e-4s_bevformer_21:07:58; step 005611/25000; rtime 0.29; itime 7.15; loss 1.96159; iou_t 28.7; iou_v 29.8
8x5_3e-4s_bevformer_21:07:58; step 005612/25000; rtime 0.26; itime 7.15; loss 1.57377; iou_t 26.5; iou_v 29.8
8x5_3e-4s_bevformer_21:07:58; step 005613/25000; rtime 0.21; itime 7.30; loss 2.18127; iou_t 25.8; iou_v 29.8
8x5_3e-4s_bevformer_21:07:58; step 005614/25000; rtime 0.17; itime 7.03; loss 2.06807; iou_t 25.9; iou_v 29.8
8x5_3e-4s_bevformer_21:07:58; step 005615/25000; rtime 0.24; itime 7.30; loss 1.96291; iou_t 25.6; iou_v 29.8
8x5_3e-4s_bevformer_21:07:58; step 005616/25000; rtime 0.20; itime 7.23; loss 1.79805; iou_t 26.3; iou_v 29.8
8x5_3e-4s_bevformer_21:07:58; step 005617/25000; rtime 0.19; itime 7.26; loss 1.75140; iou_t 26.8; iou_v 29.8
8x5_3e-4s_bevformer_21:07:58; step 005618/25000; rtime 0.22; itime 7.31; loss 1.84597; iou_t 27.1; iou_v 29.8
8x5_3e-4s_bevformer_21:07:58; step 005619/25000; rtime 0.18; itime 7.34; loss 1.59728; iou_t 27.1; iou_v 29.8
8x5_3e-4s_bevformer_21:07:58; step 005620/25000; rtime 0.30; itime 7.09; loss 2.27120; iou_t 26.8; iou_v 29.8
8x5_3e-4s_bevformer_21:07:58; step 005621/25000; rtime 0.23; itime 7.14; loss 1.91050; iou_t 26.6; iou_v 29.8
8x5_3e-4s_bevformer_21:07:58; step 005622/25000; rtime 0.15; itime 6.57; loss 1.78379; iou_t 27.5; iou_v 29.8
8x5_3e-4s_bevformer_21:07:58; step 005623/25000; rtime 0.16; itime 6.59; loss 1.61208; iou_t 28.0; iou_v 29.8
8x5_3e-4s_bevformer_21:07:58; step 005624/25000; rtime 0.15; itime 6.66; loss 1.95314; iou_t 27.8; iou_v 29.8
8x5_3e-4s_bevformer_21:07:58; step 005625/25000; rtime 0.19; itime 7.05; loss 1.64065; iou_t 28.0; iou_v 29.8
bevformer2
8x5_3e-4s_bevformer_MS_00:06:32; step 005610/25000; rtime 0.35; itime 5.87; loss 1.30963; iou_t 39.2; iou_v 5.5
8x5_3e-4s_bevformer_MS_00:06:32; step 005611/25000; rtime 0.21; itime 5.83; loss 0.97845; iou_t 39.6; iou_v 5.5
8x5_3e-4s_bevformer_MS_00:06:32; step 005612/25000; rtime 0.20; itime 6.03; loss 1.67607; iou_t 39.3; iou_v 5.5
8x5_3e-4s_bevformer_MS_00:06:32; step 005613/25000; rtime 0.23; itime 6.03; loss 1.30028; iou_t 39.3; iou_v 5.5
8x5_3e-4s_bevformer_MS_00:06:32; step 005614/25000; rtime 0.19; itime 6.19; loss 1.44862; iou_t 39.6; iou_v 5.5
8x5_3e-4s_bevformer_MS_00:06:32; step 005615/25000; rtime 0.20; itime 5.56; loss 1.33884; iou_t 39.1; iou_v 5.5
8x5_3e-4s_bevformer_MS_00:06:32; step 005616/25000; rtime 0.25; itime 6.23; loss 1.39724; iou_t 38.7; iou_v 5.5
8x5_3e-4s_bevformer_MS_00:06:32; step 005617/25000; rtime 0.24; itime 5.57; loss 1.90144; iou_t 38.5; iou_v 5.5
8x5_3e-4s_bevformer_MS_00:06:32; step 005618/25000; rtime 0.22; itime 6.27; loss 1.33009; iou_t 38.3; iou_v 5.5
8x5_3e-4s_bevformer_MS_00:06:32; step 005619/25000; rtime 0.30; itime 5.66; loss 1.22954; iou_t 37.7; iou_v 5.5
8x5_3e-4s_bevformer_MS_00:06:32; step 005620/25000; rtime 0.33; itime 5.83; loss 1.67304; iou_t 38.3; iou_v 5.5
8x5_3e-4s_bevformer_MS_00:06:32; step 005621/25000; rtime 0.23; itime 6.05; loss 1.51381; iou_t 37.9; iou_v 5.5
8x5_3e-4s_bevformer_MS_00:06:32; step 005622/25000; rtime 0.20; itime 4.56; loss 1.54351; iou_t 38.8; iou_v 5.5
8x5_3e-4s_bevformer_MS_00:06:32; step 005623/25000; rtime 0.15; itime 4.39; loss 1.39122; iou_t 38.1; iou_v 5.5
8x5_3e-4s_bevformer_MS_00:06:32; step 005624/25000; rtime 0.16; itime 5.98; loss 1.22388; iou_t 38.6; iou_v 5.5
8x5_3e-4s_bevformer_MS_00:06:32; step 005625/25000; rtime 0.17; itime 7.95; loss 2.08057; iou_t 38.4; iou_v 5.5
aharley commented 1 year ago

I think the bevformers in this repo still needs to be cleaned up -- for sure bevformer2.py is broken right now. I think the first bevformer should outperform segnet on average, so it's possible something is wrong. Did you already figure out that you need to compile the nets/ops/ ?

ptr-br commented 1 year ago

Hey,

how exactly does the compilation of the nets/ops work?

Thanks!

aharley commented 1 year ago

Go to ./nets/ops and run sh make.sh

pianogGG commented 1 year ago

@aharley How can I reproduce "Multi-scale deform. attn.” in “table1” IoU?

pianogGG commented 1 year ago

@LHY-HongyangLi Have you reproduced the Multi-scale deform. attn. 48.9 this?

pianogGG commented 1 year ago

@aharley @LHY-HongyangLi When I run bevformer:

python train_nuscenes.py \

   --exp_name="bevformer" \
   --max_iters=25000 \
   --log_freq=1000 \
   --dset='trainval' \
   --batch_size=8 \
   --grad_acc=5 \
   --use_scheduler=True \
   --data_dir='/data/nuscenes' \
   --log_dir='logs_nuscenes' \
   --ckpt_dir='checkpoints' \
   --res_scale=2 \
   --ncams=6 \
  --encoder_type='res101' \
   --do_rgbcompress=True \
   --device_ids=[0,1,2,3,4,5,6,7] 

and the log:

Screen Shot 2023-07-27 at 17 17 45

this is very strange, any good advices? Thanks a lot.

chg0901 commented 1 year ago

@aharley @LHY-HongyangLi When I run bevformer:

python train_nuscenes.py \

   --exp_name="bevformer" \
   --max_iters=25000 \
   --log_freq=1000 \
   --dset='trainval' \
   --batch_size=8 \
   --grad_acc=5 \
   --use_scheduler=True \
   --data_dir='/data/nuscenes' \
   --log_dir='logs_nuscenes' \
   --ckpt_dir='checkpoints' \
   --res_scale=2 \
   --ncams=6 \
  --encoder_type='res101' \
   --do_rgbcompress=True \
   --device_ids=[0,1,2,3,4,5,6,7] 

and the log: Screen Shot 2023-07-27 at 17 17 45 this is very strange, any good advices? Thanks a lot.

can you run the code directly? Segnet is used in the train_nuscenes.py, I changed it as "from nets.bevformernet2 import Bevformernet" and I also changed the model in train_nuscenes.py.

However, I show me an error when it goes bevformernet2.py#L497 .

Could you please share how you run or modify the codes?

Best regards and thank you very much!