ELS-RD / transformer-deploy

Efficient, scalable and enterprise-grade CPU/GPU inference server for 🤗 Hugging Face transformer models 🚀
https://els-rd.github.io/transformer-deploy/
Apache License 2.0
1.64k stars 150 forks source link

unable to host .onnx model on triton server #179

Open riyaj8888 opened 1 year ago

riyaj8888 commented 1 year ago

i am converting .pt model to .onnx by loading model to cuda:0 device and hosting it on cuda:1 by setting gpus:[1] inside config.pbtxt file but i am getting following error. onnx runtime error 6: Non-zero status code returned while running Einsum node. Name:\'/model/layer.0/rel_attn/Einsum_8\' Status Message: /workspace/onnxruntime/onnxruntime/core/providers/cpu/math/einsum_utils/einsum_auxiliary_ops.cc:298 std::unique_ptr<onnxruntime::Tensor> onnxruntime::EinsumOp::Transpose(const onnxruntime::Tensor&, const onnxruntime::TensorShape&, const gsl::span<const long unsigned int>&, onnxruntime::AllocatorPtr, void*, const Transpose&) 21Einsum op: Transpose failed: CUDA failure 1: invalid argument ; GPU=1 ; hostname=2a71d799b143 ; expr=cudaMemcpyAsync(output.MutableDataRaw(), input.DataRaw(), input.Shape().Size() * input.DataType()->Size(), cudaMemcpyDeviceToDevice, stream); \\n"}'

but when i set the gpus:[0] , it runs smoothly without any error. is this normal behavior? why it happened at the first place ?

Thanks