Pointcept / Pointcept

Pointcept: a codebase for point cloud perception research. Latest works: PTv3 (CVPR'24 Oral), PPT (CVPR'24), OA-CNNs (CVPR'24), MSC (CVPR'23)
MIT License
1.29k stars 142 forks source link

Questions about PTv3 and the efficacy of depthwise SubMConv3d #206

Open ysj9909 opened 2 months ago

ysj9909 commented 2 months ago

Hello, this is truly remarkable research! In the paper, when you mentioned using xCPE similar to the octree-based depthwise convolution (CPE), I assumed that depthwise submconv would be utilized. However, upon examining the code, I noticed that dense submconv was employed instead. Is there a specific reason for not using depthwise convolution? Additionally, in most point cloud processing backbone models, depthwise submanifold convolution is not typically used. Is this due to a significant performance drop, or is it because using dense convolution does not significantly increase computational overhead?

Gofinge commented 2 months ago

Good question! Actually, SpConv doesn't support depthwise submconv from now. And also, it is a huge challenge to accelerate a depthwise submconv (so no speed advantage). Therefore, we directly use classical sparse convolution for simplicity. I used to try to use octree-based depthwise convolution, yet the implementation is more complex and leading more restrictions on parameter settings.

ysj9909 commented 2 months ago

Thank you for the kind response! If I modify spconv.py to enable depthwise submconv, would there be any issues when using it? I'm considering using depthwise in my experiments for the following reasons, and I'd greatly appreciate your opinion on this as well:

-3D data tends to be on a smaller scale compared to 2D datasets, making it more susceptible to overfitting issues, and I believe depthwise could help mitigate this.

-Most 3D models use 3x3x3 kernels, but I think increasing them to 5x5x5 or 7x7x7 wouldn’t incur significant overhead.

-Simply replacing regular conv with depthwise might reduce channel interaction capacity and degrade performance, but I think using it alongside MLPs, similar to PT models, might prevent performance loss.

Additionally, the depthwise submconv I'm referring to is available at https://github.com/LHDuan/ConDaFormer/blob/main/libs/spconv/conv.py (NeurIPS2023 ConDaFormer).

ysj9909 commented 2 months ago

From what I've examined so far, it appears that the proposed modification simply multiplies a mask by the weights to mimic depthwise operation, which doesn't seem to offer any computational benefits.

Gofinge commented 2 months ago

From what I've examined so far, it appears that the proposed modification simply multiplies a mask by the weights to mimic depthwise operation, which doesn't seem to offer any computational benefits.

A well-optimized depthwise convolution is not trivial to implementation, I think it is the reason why from now SpConv still does not support it.

3D data tends to be on a smaller scale compared to 2D datasets, making it more susceptible to overfitting issues, and I believe depthwise could help mitigate this.

I think the Scaling Law can tell that over-parameters won't positively and negatively influence the final performance, so there is no need to care too much about parameters. The major issue is the scale of data, that's why my previous two works focus on pre-training technology on point cloud data

Most 3D models use 3x3x3 kernels, but I think increasing them to 5x5x5 or 7x7x7 wouldn’t incur significant overhead.

In PTv3, SpConv only serves as positional encoding, with no need for a large kernel. Attention is already doing well at large kernels.

ysj9909 commented 2 months ago

Thanks for answering!!!