facebookresearch / dlrm

An implementation of a deep learning recommendation model (DLRM)
MIT License
3.71k stars 825 forks source link

Embedding_bag operator on GPU #357

Open rishucoding opened 1 year ago

rishucoding commented 1 year ago

Hello,

Nvidia MLPerf suggests to use TensorRT framework for a performant inference deployment. For DLRM (DL based Recommendation Systems) inference on GPU, I have the following questions:

Please let me know your comments. Thanks

samiwilf commented 12 months ago

Hi @rishucoding.

TensorRT uses its own CUDA kernels and mainly uses ONNX to import models. It doesn't use PyTorch.

It appears that TensorRT currently lacks an embedding bag operator. It's not in TensorRT's ops table nor in ONNX's. Also, the lack of embedding bag support in ONNX was an issue raised previously in this repo and also an issue raised in ONNX's repo.

When TensorRT encounters an unsupported operator, it doesn't automatically find an implementation of it from another source like PyTorch. Instead, one would need to resort to workarounds like manually reimplementing unsupported operations in terms of operations that TensorRT supports.

It may be easier to use TensorRT for just the two MLP components of DLRM, as shown here, than for the entire model.