ELS-RD / transformer-deploy

Efficient, scalable and enterprise-grade CPU/GPU inference server for 🤗 Hugging Face transformer models 🚀
https://els-rd.github.io/transformer-deploy/
Apache License 2.0
1.64k stars 150 forks source link

Tensorrt engine #155

Open imsiddhant07 opened 1 year ago

imsiddhant07 commented 1 year ago

I tried running TRT based-off three methods:

  1. python src/transformer-deploy/convert.py
  2. exisiting docker image
  3. build docker image from repo

In all three instances, I got back the same response while running TRT backend.

The command I have been trying to run (docker for example):

docker run -it --rm --gpus all -v $PWD:/project ghcr.io/els-rd/transformer-deploy:latest bash -c "cd /project && \
  convert_model -m \"sentence-transformers/multi-qa-distilbert-cos-v1\" \
  --backend tensorrt onnx \
  --seq-len 128 128 256 \
  --batch-size 1 32 300"

When i pass only 'onnx' as backend param everything runs pretty smoothly. But face issues with 'tensorrt' backend.

[11/29/2022-10:58:17] [TRT] [E] 2: [optimizer.cpp::getFormatRequirements::2945] Error Code 2: Internal Error (Assertion !n->candidateRequirements.empty() failed. no supported formats)
[11/29/2022-10:58:17] [TRT] [E] 2: [builder.cpp::buildSerializedNetwork::636] Error Code 2: Internal Error (Assertion engine != nullptr failed. )
Traceback (most recent call last):
  File "/usr/local/bin/convert_model", line 8, in <module>
    sys.exit(entrypoint())
  File "/usr/local/lib/python3.8/dist-packages/transformer_deploy/convert.py", line 417, in entrypoint
    main(commands=args)
  File "/usr/local/lib/python3.8/dist-packages/transformer_deploy/convert.py", line 308, in main
    engine: ICudaEngine = build_engine(
  File "/usr/local/lib/python3.8/dist-packages/transformer_deploy/backends/trt_utils.py", line 206, in build_engine
    engine: ICudaEngine = runtime.deserialize_cuda_engine(trt_engine)
TypeError: deserialize_cuda_engine(): incompatible function arguments. The following argument types are supported:
    1. (self: tensorrt.tensorrt.Runtime, serialized_engine: buffer) -> tensorrt.tensorrt.ICudaEngine

Invoked with: <tensorrt.tensorrt.Runtime object at 0x7f88c85c12b0>, None
free(): invalid pointer

Would be great if I could have a workaround for this.

Versions: Python: 3.8.15 transformers-deploy: 0.5.3 TensorRT: 8.4.1.5 Onnxruntime (GPU): 1.12.0 transformers: 4.24.0

imsiddhant07 commented 1 year ago

Also, not sure if this helped.