Occasional "CUDA error cudaErrorInvalidConfiguration:invalid configuration argument" error

ELS-RD / transformer-deploy

Efficient, scalable and enterprise-grade CPU/GPU inference server for 🤗 Hugging Face transformer models 🚀

Apache License 2.0

1.66k stars 151 forks source link

I have followed the instructions at https://github.com/ELS-RD/transformer-deploy/#feature-extraction--dense-embeddings to convert a sentence-transformers model (https://huggingface.co/sentence-transformers/multi-qa-mpnet-base-dot-v1) into ONNX Runtime and deployed it with the latest Triton Inference Server.

All works well, except occasionally inference fails with the following (not random, it repeatedly fails with the same requests): [StatusCode.INTERNAL] in ensemble 'transformer_onnx_inference', onnx runtime error 1: Non-zero status code returned while running Transpose node. Name:'Transpose_84' Status Message: CUDA error cudaErrorInvalidConfiguration:invalid configuration argument

Any idea what might be wrong?

ELS-RD / transformer-deploy

Occasional "CUDA error cudaErrorInvalidConfiguration:invalid configuration argument" error #166