int8-inference Search Results

1000+ results
for int8-inference

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

bitsandbytes-foundation/bitsandbytes #1262

quantization of T5 faild. int8 model cost more inference tim…

### System Info A100-80G cuda12.1 bitsandbytes 0.43.2.dev0 diffusers 0.29.1 lion-pytorch 0.2.2 torch 2.0.1 torch-tb-profiler 0…

Worromots updated 1 month ago
2
mlcommons/inference #1847

Nvidia 4.1 inference code is giving segmentation fault for R…

Trying to run Nvidia v4.1 implementation for stable diffusion on RTX 4090. ``` (mlperf) arjun@mlperf-inference-arjun-x86-64-24944:/work$ make generate_engines RUN_ARGS="--benchmarks=stable-diffus…

arjunsuresh updated 4 weeks ago
6
THU-MIG/yolov10 #176

How to Convert YOLOv10 Model to TFLite with INT8 Quantizatio…

Hi everyone, I’m working on a project that involves deploying a YOLOv10 model on a mobile/edge device. To improve inference speed and reduce the model size, I want to convert my YOLOv10 model to Te…

AhmedFkih updated 1 month ago
11
UKPLab/sentence-transformers #3091

When using the SentenceTransformer.encode() method, the CPU …

When using HuggingFaceEmbeddings in LangChain to embed documents, I noticed that the embedding process takes significantly longer on the server compared to my local machine. My local computer has only…

hello1534 updated 8 hours ago
1
AlexeyAB/yolo2_light #18

Int8-inference on Tensor Core

I tried quantized YOLOv3 on Volta GPU. But, it didn't seem to be run on Tensor Core. CUDNN documentation in 2.8.2 recommends to use "CUDNN_DATA_INT8x32 " for Tensor Core operations. https://docs.nv…

daniel89710 updated 6 years ago
8
microsoft/DeepSpeed #2956

[BUG] Int8 Inference Does Not Work For GPTJ

**Describe the bug** Trying to use DeepSpeed Inference with int8 does not work for GPTJ. I get created an issue that has more details on the DeepSpeed MII repo, but due to the nature of the issue, I…

mallorbc updated 4 months ago
8
microsoft/onnxruntime #21702

[Performance] pytorch quantize_qat model export to onnx, ins…

### Describe the issue I do a qat quantization on a cnn model, when a export it to onnx model, and got a slower inference than torchscript qat model. the result is torchscript: 4.798517942428589 …

wangyunxiaa updated 2 months ago
6
intel/intel-npu-acceleration-library #138

It seems that the model is not loaded on NPU Memory

**Describe the bug** My CPU is Ultra 7 258v, and the system is Windows 11Home 24H2. I just tried running the qwen2.5-7b-instruct-model using your example code for the first time. However, I noticed t…

BigYellowTiger updated 2 weeks ago
1
NVIDIA/TensorRT-LLM #2338

Whisper Encoder issues with Executor API

Hello, `0.15.0.dev2024101500` added a new issue when using the executor API with whisper ``` [TensorRT-LLM][ERROR] IExecutionContext::inferShapes: Error Code 7: Internal Error (WhisperEncoder/__add_…

MahmoudAshraf97 updated 1 day ago
6
NVIDIA/TensorRT #3914

Pointers for TensorRT model with uint8/int8 input

By using [pytorch-quantization](https://docs.nvidia.com/deeplearning/tensorrt/pytorch-quantization-toolkit/docs/index.html) i was able to create TensorRT engine models that are (almost) fully int8 and…

Michelvl92 updated 1 month ago
20

上一页 1...1 2 3 4 5 6 7...100 下一页

1000+ results for int8-inference

1000+ results
for int8-inference