int8-quantization Search Results

1000+ results
for int8-quantization

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

pytorch/ao #64

[New Feature] CUTLASS kernels for w4a8 quantization

We plan to add QAT for LLMs to torchao (as mentioned in the original RFC here https://github.com/pytorch-labs/ao/issues/47) For this to run efficiently on the GPU we'd need kernel support for W4A8…

supriyar updated 2 months ago
4
Deci-AI/super-gradients #1158

Understanding Quantization results

### 💡 Your Question Hi, I am just checking, I see in the provided results that Yolo-NAS-L does not suffer much reduction in performance going to Yolo-NAS-INT8-L. Can I check what exactly is meant …

lpkoh updated 10 months ago
3
NVIDIA/TensorRT-LLM #1285

OOM when using quantize.py to quantize llama-like model

### System Info - GPU: 2xA100-40G - TensorRT-LLM v0.8.0 ### Who can help? @Tracin ### Information - [X] The official example scripts - [ ] My own modified scripts ### Tasks - [ ] An officia…

andakai updated 1 month ago
2
Xilinx/Vitis-AI #1285

Vitis AI quantizer

I have a question. Can the Vitis AI quantizer be used with formats other than INT8 on the **ZCU104**? Also, after quantization, is the computation performed using INT8 or is it just stored as INT8? If…

ryan880718 updated 9 months ago
1
microsoft/BitBLAS #89

How to measure and compare the time in QuickStart

Hello, I measured the time between BitBlas matmul and normal torch.matmul in your QuickStart code, but there appears to be no speedup. Am I missing something? ``` import bitblas import torch …

ZiqingChang updated 1 day ago
4
vllm-project/llm-compressor #23

[ UX ] List of Modifiers Not Working Properly

- Running with the following is not working properly - Running with the recipe is working properly ```python from datasets import load_dataset from transformers import AutoTokenizer from llmc…

robertgshaw2-neuralmagic updated 2 days ago
1
NVIDIA/TensorRT #3417

Does TensorRT support QAT & PTQ INT8 quantization of clip/v…

Does TensorRT support QAT&PTQ INT8 quantization of clip/vit models? Could you please provide any relevant quantization examples and accuracy & latency benchmark?

shhn1 updated 8 months ago
3
tensorflow/tensorflow #62530

Internal quantize ops don't match external quantization

### 1. System information - Occurs in Google Colab w/ TF 2.14 - Have also verified w. TF 2.7 (Anaconda) on Windows 10 ### 2. Code [Colab to reproduce issue](https://colab.research.google.com…

EClemMarq updated 3 weeks ago
4
NVIDIA/TensorRT-LLM #632

TensorRT-LLM Requests

Hi all, this issue will track the feature requests you've made to TensorRT-LLM & provide a place to see what TRT-LLM is currently working on. Last update: `Jan 14th, 2024` 🚀 = in development #…

ncomly-nvidia updated 2 weeks ago
8
triton-inference-server/onnxruntime_backend #240

Will onxxruntime backend support INT8 on cpu ?

Hi, we are trying to quantise our onnx models to int8 to run on cpu using : https://onnxruntime.ai/docs/performance/model-optimizations/quantization.html#quantization-on-gpu we are using dynamic …

bharadwajymg updated 4 months ago
1

上一页 1...13 14 15 16 17 18 19...100 下一页

1000+ results for int8-quantization

1000+ results
for int8-quantization