tensorrt-int8-python Search Results

1000+ results
for tensorrt-int8-python

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

NVIDIA/TensorRT #4107

Could not decode serialized type: np.ndarray. This could be …

## Description I'm trying to generate a calibration cache file for post-training-quantizatio using Polygraphy. For which I created custom input json file referring to this [https://github.com/NVIDIA/…

Rashmip-nd updated 2 weeks ago
2
NVIDIA/TensorRT-LLM #2158

KeyError: 'llava_llama'

Hi TensorRT-LLM team, Your work is incredible. By following the READme file for [multi-modeling](https://github.com/NVIDIA/TensorRT-LLM/blob/main/examples/multimodal/README.md), we were sucess to run…

tiend1 updated 1 week ago
3
NVIDIA/TensorRT-LLM #2004

Not found: unable to load shared library: libtensorrt_llm.so…

Hello, I want to deploy llama-3-8b quantized model using tritonserver I followed below steps to do this: 1. create container with nvcr.io/nvidia/tritonserver:24.06-trtllm-python-py3 base image. 3.…

nikhilcms updated 2 weeks ago
11
NVIDIA/TensorRT #2984

Conversion to int8 with trtexec fails

## Description I am trying to convert onnx model to int8 with latest TensorRT. I got the following error: ``` [05/19/2023-14:42:31] [E] Error[2]: Assertion getter(i) != 0 failed. [05/19/2023-14…

DaraOrange updated 2 months ago
6
NVIDIA/TensorRT-LLM #283

Seeing GPU OOM errors when using `--paged_kv_cache` option

Opening a new issue as #237 was closed prematurely. It seems that engines built using the `--paged_kv_cache` flag leak GPU memory. Below is a minimal reproducible example code that can be used to …

cody-moveworks updated 2 months ago
7
NVIDIA/TensorRT #4072

Could not find any implementation for node /backbone/layers.…

## Description I am using this [calibration script](https://github.com/rmccorm4/tensorrt-utils/tree/master/int8/calibration) to generate the calib cache file for Segformer onnx model. But facing th…

NannilaJagadees updated 1 month ago
6
ultralytics/ultralytics #15873

RuntimeError in YOLOv10 TFLite INT8 Conversion: Tensor Dimen…

### Search before asking - [X] I have searched the Ultralytics YOLO [issues](https://github.com/ultralytics/ultralytics/issues) and found no similar bug report. ### Ultralytics YOLO Component Expo…

rjacaac updated 16 hours ago
15
NVIDIA/TensorRT-LLM #741

【bloom】convert_checkpoint.py local variable 'int8_weights' …

I follow the readme : ## Build model with both INT8 weight-only and INT8 KV cache enabled python convert_checkpoint.py --model_dir ./bloom/560m/ \ --dtype float16 \ …

scarydemon2 updated 9 months ago
1
mlcommons/inference #1866

retinanet run harness fails 'executionContext.cpp::setOptimi…

Trying to run offline retinanet in a container with one Nvidia GPU: cm run script --tags=run-mlperf,inference,_find-performance,_full,_r4.1-dev --model=retinanet --implementation=nvidia …

stbailey001 updated 2 days ago
2
NVIDIA/TensorRT-LLM #1925

moe kernel Assertion failed when running qwen2-moe-57B-A14B …

I am using trtllm 0.8.0 (added moe support following llama's implementation). we serve models with trtllm_backend (docker images triton-trtllm-24.02) [qwen2-moe-57B-A14B](https://huggingface.co/Qwe…

handoku updated 2 months ago
2

上一页 1...1 2 3 4 5 6 7...100 下一页

1000+ results for tensorrt-int8-python

1000+ results
for tensorrt-int8-python