HuangJunJie2017 / BEVDet

Code base of the BEVDet series .
Apache License 2.0
1.45k stars 265 forks source link

Optimize bev_pool_grad_kernel #302

Open GHGmc2 opened 1 year ago

GHGmc2 commented 1 year ago

We can get ~4x speedup on A00 80GB for the shapes:

out_grad: torch.Size([10, 1, 192, 256, 128]), torch.float32
depth_grad: torch.Size([10, 7, 120, 64, 120]), torch.float32
feat_grad: torch.Size([10, 7, 64, 120, 128]), torch.float32
depth: torch.Size([10, 7, 120, 64, 120]), torch.float32
feat: torch.Size([10, 7, 64, 120, 128]), torch.float32
ranks_depth: torch.Size([28994652]), torch.int32
ranks_feat: torch.Size([28994652]), torch.int32
ranks_bev: torch.Size([28994652]), torch.int32
interval_lengths_bp: torch.Size([537600]), torch.int32
interval_starts_bp: torch.Size([537600]), torch.int32
rubbish001 commented 4 months ago

有没有前向改进的,test的时候太慢了,等修复