NVIDIA-AI-IOT / Lidar_AI_Solution

A project demonstrating Lidar related AI solutions, including three GPU accelerated Lidar/camera DL networks (PointPillars, CenterPoint, BEVFusion) and the related libs (cuPCL, 3D SparseConvolution, YUV2RGB, cuOSD,).
Other
1.19k stars 204 forks source link

Can not find any matched kernel 9 x 32 #128

Open akjt opened 11 months ago

akjt commented 11 months ago

Hi,

We are unfortunately blocked by the following error: Assert failed 💀. false in file src/spconv/implicit-gemm.cu:383, message: Can not find any matched kernel 9 x 32 I understand that you only support Nx4/Nx5/Nx8/Nx16 . The question is if you would be able to support Nx32 or allow us to add this extension?

hopef commented 11 months ago

Currently, Nx32 and Nx16 are supported.

akjt commented 11 months ago

@hopef : since when was Nx32 supported. I am running the most recent master branch and I get the error above which suggests that Nx32 is not supported?

hopef commented 11 months ago

The supported shape pairs are called MxN, which means the number of input channels is M, and the number of output channels is N.

These support shape pairs are listed below: 128x128 16x16 16x32 32x32 32x64 4x16 5x16 64x64 8x16

For your committed error message. It has 9 input channels and 32 output channels. That's an unsupported pattern on the above list.

akjt commented 11 months ago

Okay thanks for the clarification. Any plans to extend the above list for more shape pairs?

hopef commented 11 months ago

We will let you know if we have any updates.

akjt commented 7 months ago

Any updates? or any updates to whether you would open source the source code behind libspconv? Is the src code similar to traveller59/spconv github repo?

hopef commented 7 months ago

or any updates to whether you would open source the source code behind libspconv? -> Currently, we have no plans to open-source the libspconv code.

Is the src code similar to traveller59/spconv github repo? -> Totally different with traveller59/spconv, src code is based on CUDA kernels(TensorCore only) instead of CUTLASS. -> We developed the highly optimized engine(contains Network build, low-level op, memory reuse, and layerfuse ...), But traveller59/spconv just provides a low-level op implementation for training, not for inference.

akjt commented 7 months ago

@hopef can your libspconv.so also be used for training. we are currently using traveller59/spconv for training and running with libspconv.so for inference. We are concerned this may have caused a worse performance of our inference engine .