ELS-RD / transformer-deploy

Efficient, scalable and enterprise-grade CPU/GPU inference server for 🤗 Hugging Face transformer models 🚀
https://els-rd.github.io/transformer-deploy/
Apache License 2.0
1.64k stars 150 forks source link

Occasional "CUDA error cudaErrorInvalidConfiguration:invalid configuration argument" error #166

Open zoltan-fedor opened 1 year ago

zoltan-fedor commented 1 year ago

I have followed the instructions at https://github.com/ELS-RD/transformer-deploy/#feature-extraction--dense-embeddings to convert a sentence-transformers model (https://huggingface.co/sentence-transformers/multi-qa-mpnet-base-dot-v1) into ONNX Runtime and deployed it with the latest Triton Inference Server.

All works well, except occasionally inference fails with the following (not random, it repeatedly fails with the same requests): [StatusCode.INTERNAL] in ensemble 'transformer_onnx_inference', onnx runtime error 1: Non-zero status code returned while running Transpose node. Name:'Transpose_84' Status Message: CUDA error cudaErrorInvalidConfiguration:invalid configuration argument

Any idea what might be wrong?

Zalways commented 7 months ago

have you solved this problem? i also met this problem(Non-zero status code returned while running TopK node. Name:'/model/TopK' Status Message: CUDA error cudaErrorInvalidConfiguration:invalid configuration argument) when i use cuda to inference, but it works well when i inference on cpu