LittlePey / SFD

Sparse Fuse Dense: Towards High Quality 3D Detection with Depth Completion (CVPR 2022, Oral)
Apache License 2.0
263 stars 35 forks source link

Could you please provide a version of Spconv2, I tried to fix it but still got some errors. #2

Closed kx-Z closed 2 years ago

LittlePey commented 2 years ago

Hi, you may need to refer to the OpenPCDet commit that enables OpenPCDet to support spconv2.

Alternatively, you can directly add our method to a version of OpenPCDet which supports spconv2. You can refer to our modification commit1 and commit2 on Voxel-R-CNN. It won't take much time. image

Orbis36 commented 2 years ago

Hey man, @LittlePey Cong for being selected as Oral first, I really love your work. Thanks for your suggestion, you solve the problem which make me confused in the past 2 days. I put some notes here to help the followers. A Chinese version can be found at here

Basically, if you are using the 30 series cards like me, the following content will help u to run the model. First, make sure you are using Spconv>2.0, I have tried all versions including the version in Voxel-RCNN(1.2.1), 1.0.0, 1.1.0, and all of them will make some errors when you compile the Spconv, like what mention in here and here. After you solve them, you can basically run the codes for seveal iterations, and quickly u will find CUDA error like this:

RuntimeError: /home/tianranliu/PycharmProjects/spconv/src/spconv/indice.cu 282 cuda execution failed with error 700 an illegal memory access was encountered prepareSubMGridKernel failed

I go stuck here and it seems there is no 100% suitable way to solve this question, since it needs torch version lower than 1.6, with cuda 10.1. however this configuration will lead error in Turing sturcture cards(cuda > 11).

So after several hours' work, I found you have to use spconv2.0 here and just like what u mentioned, follow the modification in OpenPCDet, you can run it successfully, the only thing need to be mentioned here is don't forget add

.contiguous()

after the sparse_idx.int() in around 601 lines of sfd_head.py Another important thing is, when I run the codes on a single cards by

python train.py --cfg_file cfgs/kitti_models/sfd.yaml --batch_size 2

there will be some problems in the data collection because there is a key in data_dict which is useless in training haven't been popped. That will lead a problem like

ValueError: all input arrays must have the same shape, TypeError

So basically u need to modify codes in around 504 lines of kitti_dataset_sfd.py to this: Add the following line before the loop:

data_dict = data_dict.pop('valid_noise')

Then you can run it smoothly.

hailanyi commented 2 years ago

Hi everyone: I also encountered the bug similar to @Orbis36 . By using cuda 11.1+3090 card + spconv1.2, I can only obtain 66% AP based on the released pre-trained model. I also trained the network on my device, I can only obtain 85% AP. This bug is mostly caused by the spconv.SubMConv3d used in line 76 of sfd_head.py . Replace the spconv1.2 by spconv2.1, the detection performance becomes 88% AP.

The pre-trained model performance:
Car AP_R40@0.70, 0.70, 0.70:
bbox AP:99.2731, 97.5200, 95.3477
bev  AP:98.9634, 94.3190, 91.9491
3d   AP:95.9244, 88.9693, 86.2965
aos  AP:99.24, 97.31, 95.06
The best trained model from my device:
Car AP_R40@0.70, 0.70, 0.70:
bbox AP:99.0432, 97.5449, 95.1939
bev  AP:98.8534, 94.1777, 91.8114
3d   AP:95.4631, 88.3342, 85.8028
aos  AP:98.92, 97.28, 94.87
LittlePey commented 2 years ago

Hi, @Orbis36 @hailanyi Thanks for your sharing. We will update our codes to support spconv2 after our devices upgrade~

kx-Z commented 2 years ago

Thank you for your help!

GP-Bone commented 1 year ago

Hi everyone: I also encountered the bug similar to @Orbis36 . By using cuda 11.1+3090 card + spconv1.2, I can only obtain 66% AP based on the released pre-trained model. I also trained the network on my device, I can only obtain 85% AP. This bug is mostly caused by the spconv.SubMConv3d used in line 76 of sfd_head.py . Replace the spconv1.2 by spconv2.1, the detection performance becomes 88% AP.

The pre-trained model performance:
Car AP_R40@0.70, 0.70, 0.70:
bbox AP:99.2731, 97.5200, 95.3477
bev  AP:98.9634, 94.3190, 91.9491
3d   AP:95.9244, 88.9693, 86.2965
aos  AP:99.24, 97.31, 95.06
The best trained model from my device:
Car AP_R40@0.70, 0.70, 0.70:
bbox AP:99.0432, 97.5449, 95.1939
bev  AP:98.8534, 94.1777, 91.8114
3d   AP:95.4631, 88.3342, 85.8028
aos  AP:98.92, 97.28, 94.87

@hailanyi Hi,hailanyi. I was surprised by the 3d mAP results of the pre-training model, it was a very high score. Although I achieved decent scores on my device, except for the pre-trained model “checkpoint_eopch_34”. Can you share the command that was used to eval the pre-trained model ?Looking forward to your reply.

The best trained model from my device: 2022-11-24 02:44:47,752 INFO Car AP@0.70, 0.70, 0.70: bbox AP:97.3075, 90.3150, 89.9768 bev AP:90.4947, 89.3999, 88.8649 3d AP:90.1558, 87.3190, 85.7557 aos AP:97.28, 90.22, 89.82 Car AP_R40@0.70, 0.70, 0.70: bbox AP:99.1403, 95.8640, 95.4959 bev AP:96.4393, 92.5552, 92.0417 3d AP:95.8179, 88.8519, 86.2185 aos AP:99.11, 95.73, 95.28

The pre-trained model performance: 2022-11-30 11:40:29,646 INFO Car AP@0.70, 0.70, 0.70: bbox AP:95.8195, 88.8054, 87.4287 bev AP:90.3034, 85.8910, 80.1571 3d AP:89.2347, 78.5606, 76.1646 aos AP:95.74, 88.23, 86.70 Car AP_R40@0.70, 0.70, 0.70: bbox AP:98.5366, 90.3064, 87.9375 bev AP:95.6103, 86.5961, 84.0745 3d AP:91.9405, 78.7086, 76.0522 aos AP:98.46, 89.74, 87.24

Choongmyeon-Lee commented 1 month ago

Hey man, @LittlePey Cong for being selected as Oral first, I really love your work. Thanks for your suggestion, you solve the problem which make me confused in the past 2 days. I put some notes here to help the followers. A Chinese version can be found at here

Basically, if you are using the 30 series cards like me, the following content will help u to run the model. First, make sure you are using Spconv>2.0, I have tried all versions including the version in Voxel-RCNN(1.2.1), 1.0.0, 1.1.0, and all of them will make some errors when you compile the Spconv, like what mention in here and here. After you solve them, you can basically run the codes for seveal iterations, and quickly u will find CUDA error like this:

RuntimeError: /home/tianranliu/PycharmProjects/spconv/src/spconv/indice.cu 282 cuda execution failed with error 700 an illegal memory access was encountered prepareSubMGridKernel failed

I go stuck here and it seems there is no 100% suitable way to solve this question, since it needs torch version lower than 1.6, with cuda 10.1. however this configuration will lead error in Turing sturcture cards(cuda > 11).

So after several hours' work, I found you have to use spconv2.0 here and just like what u mentioned, follow the modification in OpenPCDet, you can run it successfully, the only thing need to be mentioned here is don't forget add

.contiguous()

after the sparse_idx.int() in around 601 lines of sfd_head.py Another important thing is, when I run the codes on a single cards by

python train.py --cfg_file cfgs/kitti_models/sfd.yaml --batch_size 2

there will be some problems in the data collection because there is a key in data_dict which is useless in training haven't been popped. That will lead a problem like

ValueError: all input arrays must have the same shape, TypeError

So basically u need to modify codes in around 504 lines of kitti_dataset_sfd.py to this: Add the following line before the loop:

data_dict = data_dict.pop('valid_noise')

Then you can run it smoothly.

In my case, I wasn't sure where to appropriately place data_dict = data_dict.pop('valid_noise'), so I resolved the issue another way. I added the following line at line 492:

if key in ['valid_noise']:
    continue

Here's a rough outline of the relevant code:

for key, val in data_dict.items():
    # LCM EDIT
    if key in ['valid_noise']:
        continue
    try:
        if key in ['voxels', 'voxel_num_points', 'voxels_pseudo', 'voxel_num_points_pseudo']:
            ret[key] = np.concatenate(val, axis=0)
        elif key in ['points', 'voxel_coords', 'points_pseudo', 'voxel_coords_pseudo']:
            coors = []
            ...

This approach solved the issue, allowing me to adjust the batch size, but I still don't know where or how valid_noise was introduced.