NVIDIA / cutlass

CUDA Templates for Linear Algebra Subroutines
Other
5.66k stars 971 forks source link

[FEA] Conv3d for SIMT #733

Closed ernestkchan closed 1 year ago

ernestkchan commented 1 year ago

Is your feature request related to a problem? Please describe. Currently, conv3d kernels like those in default_conv3d_fprop only support tensor cores. It would be great to have SIMT support, similar to conv2d.

Describe the solution you'd like Implement SIMT options for conv3d. Even just fprop support would be great.

Describe alternatives you've considered A user can use the reference implementation for convolutions. However it's not as performant.

Additional context Something similar to conv2d would be great, as it supports both SIMT and tensor cores

hwu36 commented 1 year ago

we don't really have plan to support simt conv3d to run on old arch. It is not hard to support it, just compare the diff between conv3d tensorop and conv2d tensorop, and apply the same to simt conv2d. conv3d just needs one more step to map to conv2d. we welcome community contribution.

ernestkchan commented 1 year ago

Thanks for the quick response @hwu36 . I understand not supporting old architectures. But, my understanding is tensor cores only support specific tensor shapes, which is a bit restrictive for end-users. Seems like SIMT support would make Cutlass Conv3d useful for more users?

conv3d just needs one more step to map to conv2d

Could you please elaborate on this step?

hwu36 commented 1 year ago

my understanding is tensor cores only support specific tensor shapes, which is a bit restrictive for end-users.

What shape you want to use is not supported?

Could you please elaborate on this step?

conv2d is essentially nested 2 level for loop. conv3d is nested 3 level for loop.

Take analytical iterator as an example, the new things are

https://github.com/NVIDIA/cutlass/blob/master/include/cutlass/conv/threadblock/conv3d_fprop_activation_tile_access_iterator_analytic.h#L151

https://github.com/NVIDIA/cutlass/blob/master/include/cutlass/conv/threadblock/conv3d_fprop_activation_tile_access_iterator_analytic.h#L192

https://github.com/NVIDIA/cutlass/blob/master/include/cutlass/conv/threadblock/conv3d_fprop_activation_tile_access_iterator_analytic.h#L206

https://github.com/NVIDIA/cutlass/blob/master/include/cutlass/conv/threadblock/conv3d_fprop_activation_tile_access_iterator_analytic.h#L234

ernestkchan commented 1 year ago

What shape you want to use is not supported?

Arbitrary shapes – matrix dimensions that are not necessarily multiples of 16 bytes. I know I can pad but its a bit easier to not have to. Please let me know if I don't fully understand when tensor cores can be used.

Thanks for the links 👍

hwu36 commented 1 year ago

In your case, enabling small alignment for tensor cores is more useful for you than simt. Our community enabled small alignment for conv2d in https://github.com/NVIDIA/cutlass/pull/246 . Just do the same to conv3d. Just pay attention to the files that have conv2d and fprop in the file name.

ernestkchan commented 1 year ago

Thanks!

github-actions[bot] commented 1 year ago

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

github-actions[bot] commented 1 year ago

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

mnicely commented 1 year ago

Closing due to inactivity. Please reopen if needed.