-
I'm trying to use Triton to deploy baichuan2-13B inference under bf16 precision. The tritonserver can be started successfully, but when processing client request, it crashed.
- Use TensorRT-LLM v0…
-
It seems a bit unfair to file this as a "bug," when really what's going on is that the Python community is trying to figure out what a "typed" Python library looks like. In this case, what looks like …
-
### Your current environment
PyTorch version: 2.3.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
OS: Ubuntu Jammy Jellyfish (development branch…
-
I have created a Streamlit App to as a demo of a project on Multilingual Text Classification using mBERT in PyTorch. When I run the app with the command `python app.py` it works fine but when I try to…
-
Hello Microsoft team,
We would like to know what are the possibilities for FP16 optimization in ONNX Runtime inference engine and the Execution Providers? Does ONNX Runtime support FP16 optimized m…
-
-
I was trying to run Detectron2 as an onnx engine- I first turned Detectron2 into .onnx format, then I turned it into a TensorRT engine, when I then tried to run inference on it it ran what I felt was …
-
Probably dup of #3956. If that is the case - sorry for spamming, but anyway:
## Description
We have encountered misaligned address error when we were trying to build engine from onnx model.
By tria…
-
## ❓ Question
I have a PTQ model and a QAT model trained with the official pytorch API following the quantization tutorial, and I wish to deploy them on TensorRT for inference. The model is metaforme…
-
When I try to use centerpose to start the node ros2 launch isaac_ros_centerpose isaac_ros_centerpose_tensor_rt.launch.py model_file_path:=/home/nvidia/Chen/centerpose/bottle_DLA34.onnx engine_file_p…