-
### System Info
A100-80G
cuda12.1
bitsandbytes 0.43.2.dev0
diffusers 0.29.1
lion-pytorch 0.2.2
torch 2.0.1
torch-tb-profiler 0…
-
Trying to run Nvidia v4.1 implementation for stable diffusion on RTX 4090.
```
(mlperf) arjun@mlperf-inference-arjun-x86-64-24944:/work$ make generate_engines RUN_ARGS="--benchmarks=stable-diffus…
-
Hi everyone,
I’m working on a project that involves deploying a YOLOv10 model on a mobile/edge device. To improve inference speed and reduce the model size, I want to convert my YOLOv10 model to Te…
-
When using HuggingFaceEmbeddings in LangChain to embed documents, I noticed that the embedding process takes significantly longer on the server compared to my local machine. My local computer has only…
-
I tried quantized YOLOv3 on Volta GPU.
But, it didn't seem to be run on Tensor Core.
CUDNN documentation in 2.8.2 recommends to use "CUDNN_DATA_INT8x32 " for Tensor Core operations.
https://docs.nv…
-
**Describe the bug**
Trying to use DeepSpeed Inference with int8 does not work for GPTJ. I get created an issue that has more details on the DeepSpeed MII repo, but due to the nature of the issue, I…
-
### Describe the issue
I do a qat quantization on a cnn model, when a export it to onnx model, and got a slower inference than torchscript qat model.
the result is
torchscript: 4.798517942428589 …
-
**Describe the bug**
My CPU is Ultra 7 258v, and the system is Windows 11Home 24H2. I just tried running the qwen2.5-7b-instruct-model using your example code for the first time. However, I noticed t…
-
Hello, `0.15.0.dev2024101500` added a new issue when using the executor API with whisper
```
[TensorRT-LLM][ERROR] IExecutionContext::inferShapes: Error Code 7: Internal Error (WhisperEncoder/__add_…
-
By using [pytorch-quantization](https://docs.nvidia.com/deeplearning/tensorrt/pytorch-quantization-toolkit/docs/index.html) i was able to create TensorRT engine models that are (almost) fully int8 and…