NVIDIA / TensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
https://developer.nvidia.com/tensorrt
Apache License 2.0
10.89k stars 2.14k forks source link

Embedding_bag operator on GPU #3319

Open rishucoding opened 1 year ago

rishucoding commented 1 year ago

Hello,

Nvidia MLPerf suggests to use TensorRT framework for a performant inference deployment. For DLRM (DL based Recommendation Systems) inference on GPU, I have the following questions:

Please let me know your comments. Thanks

zerollzeng commented 1 year ago

@nvpohanh ^ ^

nvpohanh commented 1 year ago

For Gather operation, TRT generates the kernel dynamically and tries to fuse it with other pointwise operations if possible. That means, we do not use the same Gather kernels as PyTorch does.

nvpohanh commented 1 year ago

What are the benefits of using vanilla PyTorch over TensorRT for DLRM inference?

Our MLPerf-Inference submission uses TensorRT for the DLRM benchmark: https://github.com/mlcommons/inference_results_v3.1/tree/main/closed/NVIDIA

Using TensorRT allows more aggressive fusions like Gemm+Pointwise fusions.

ttyio commented 1 year ago

closing since no activity for more than 3 weeks, thanks all!

rishucoding commented 9 months ago

Thanks @nvpohanh for the comments. Could you share the source code for TRT implementation of Gather Kernel used in Embedding Stage for DLRMs? Also, could you compare the TRT gather kernel with the PyTorch Embedding Stage CUDA kernel (link)

zerollzeng commented 9 months ago

@nvpohanh ^ ^

rishucoding commented 5 months ago

Hi -- could you please share your comments on my follow-up question? Thanks.