Open zoltan-fedor opened 1 year ago
have you solved this problem? i also met this problem(Non-zero status code returned while running TopK node. Name:'/model/TopK' Status Message: CUDA error cudaErrorInvalidConfiguration:invalid configuration argument) when i use cuda to inference, but it works well when i inference on cpu
I have followed the instructions at https://github.com/ELS-RD/transformer-deploy/#feature-extraction--dense-embeddings to convert a sentence-transformers model (https://huggingface.co/sentence-transformers/multi-qa-mpnet-base-dot-v1) into ONNX Runtime and deployed it with the latest Triton Inference Server.
All works well, except occasionally inference fails with the following (not random, it repeatedly fails with the same requests):
[StatusCode.INTERNAL] in ensemble 'transformer_onnx_inference', onnx runtime error 1: Non-zero status code returned while running Transpose node. Name:'Transpose_84' Status Message: CUDA error cudaErrorInvalidConfiguration:invalid configuration argument
Any idea what might be wrong?