Open manarshehadeh opened 11 months ago
I am facing the same issue for CodeLlama 34B Instruction model:
Hi @wangyubo111 Could u please try out latest release to see if this issue still exists or not? And do u still have further issue or question now? If not, we'll close it soon.
Env:
Issue: Ensemble model is loaded successfully, but when inferencing it with HTTP request using cmd:
curl -X POST localhost:8000/v2/models/ensemble/generate -d '{"text_input": "What is machine learning?", "max_tokens": 20, "bad_words": "", "stop_words": ""}'
Request fails with call stack:
Assertion failed: input_ids: expected 2 dims, provided 1 dims (/app/tensorrt_llm/cpp/tensorrt_llm/runtime/tllmRuntime.cpp:138)\n1 0x7f451f4697fd /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x177fd) [0x7f451f4697fd]\n2 0x7f451f5797d8 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x1277d8) [0x7f451f5797d8]\n3 0x7f451f4cbeb1 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x79eb1) [0x7f451f4cbeb1]\n4 0x7f451f4cd319 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x7b319) [0x7f451f4cd319]\n5 0x7f451f4d0f0d /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x7ef0d) [0x7f451f4d0f0d]\n6 0x7f451f4bba28 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x69a28) [0x7f451f4bba28]\n7 0x7f451f4bffb5 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x6dfb5) [0x7f451f4bffb5]\n8 0x7f45b344f253 /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xdc253) [0x7f45b344f253]\n9 0x7f45b31dfac3 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x7f45b31dfac3]\n10 0x7f45b3271660 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x126660) [0x7f45b3271660]"}%
Should inference request for multiple-GPU engines work the same as single GPU?