Open GHGmc2 opened 1 year ago
We can get ~4x speedup on A00 80GB for the shapes:
out_grad: torch.Size([10, 1, 192, 256, 128]), torch.float32 depth_grad: torch.Size([10, 7, 120, 64, 120]), torch.float32 feat_grad: torch.Size([10, 7, 64, 120, 128]), torch.float32 depth: torch.Size([10, 7, 120, 64, 120]), torch.float32 feat: torch.Size([10, 7, 64, 120, 128]), torch.float32 ranks_depth: torch.Size([28994652]), torch.int32 ranks_feat: torch.Size([28994652]), torch.int32 ranks_bev: torch.Size([28994652]), torch.int32 interval_lengths_bp: torch.Size([537600]), torch.int32 interval_starts_bp: torch.Size([537600]), torch.int32
有没有前向改进的,test的时候太慢了,等修复
We can get ~4x speedup on A00 80GB for the shapes: