NVIDIA / TensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
https://developer.nvidia.com/tensorrt
Apache License 2.0
10.86k stars 2.14k forks source link

[Question ] Affine Group Normalization is not supported at the moment. #1277

Open MatthieuToulemont opened 3 years ago

MatthieuToulemont commented 3 years ago

Description

The current Group Normalization Plugin does not seem to support affine Group Normalization as defined here (PyTorch) .

It seems like the scale is set to a vector of 1 and the bias to a vector of 0 by default.

Furthermore, the GroupNormalizationPluginCreator only accepts the parameters epsilon and nbGroups.

Would it be possible to add a way to use affine Group Norm ?

Thank you for your time and consideration

nvpohanh commented 2 years ago

@rajeevsrao Should we add this to RFE list?

rajeevsrao commented 2 years ago

@MatthieuTPHR if possible can you please provide some context on the usecase/model that would be unblocked by support for affine groupnorm? We will add it to our RFE list, however if you are willing to contribute the changes, that would be greatly welcome as well. Thanks.

cc @samurdhikaru

david-PHR commented 2 years ago

Hello @rajeevsrao and @samurdhikaru , In addition to what @MatthieuTPHR said I give you some additional informations :

It is possible to support an affine group_norm without use the scaleShiftChannelsInplace CUDA kernel. Regarding to the kernel implementation, the weights and biases buffers are never use . This Kernel is not useful at all...

An easy way to support an affine group_norm layer is to add a scaleNd layer after the group_norm plugin: addScaleNd(*group_norm_layer_out, nvinfer1::ScaleMode::kCHANNEL, bias_.data, weights_.data, power_.data, 1);. In that case all should perfectly works.

Moreover it is possible to support dynamic input size to this plugin by reshaping the cudnn tensor if the input dimension changes at the runtime. It is also possible to support half precision implementation with something like that : CHECK_CUDNN(cudnnSetTensor4dDescriptor(desc, CUDNN_TENSOR_NCHW, (inputDesc[0].type == nvinfer1::DataType::kFLOAT) ? CUDNN_DATA_FLOAT : (inputDesc[0].type == nvinfer1::DataType::kHALF) ? CUDNN_DATA_HALF : CUDNN_DATA_FLOAT, 1, batchSize * mNbGroups, groupSize, mChannelVolume ));

Also I would recommend you to use the CUDNN_BATCHNORM_SPATIAL_PERSISTENT instead of CUDNN_BATCHNORM_SPATIAL which provide best performances.

Cheers,

David