NVIDIA / DeepLearningExamples

State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.
13.47k stars 3.21k forks source link

Query BlockSize with correct function #1238

Open wchen61 opened 1 year ago

wchen61 commented 1 year ago

Should use corresponding func to query block size with or without NCHW.

https://github.com/NVIDIA/DeepLearningExamples/blob/eb3571096d13f04a71c802b005000f488a8f0139/PyTorch/Segmentation/MaskRCNN/pytorch/maskrcnn_benchmark/csrc/cuda/ROIAlign_cuda.cu#L506 https://github.com/NVIDIA/DeepLearningExamples/blob/eb3571096d13f04a71c802b005000f488a8f0139/PyTorch/Segmentation/MaskRCNN/pytorch/maskrcnn_benchmark/csrc/cuda/ROIAlign_cuda.cu#L584

ntajbakhsh commented 1 year ago

Can you be more specific about the bug and how it's affecting you?

wchen61 commented 1 year ago

Can you be more specific about the bug and how it's affecting you?

Hi, @ntajbakhsh. I can run this model, but just found the code is not align if I run with NHWC mode.

cudaOccupancyMaxPotentialBlockSize will return different gridSize for RoIAlignBackwardFeature and RoIAlignBackwardFeatureNHWC.