int8-quantization Search Results

1000+ results
for int8-quantization

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

NVIDIA/TensorRT-LLM #1172

Failed to quantize Llama2 70b fine tuned model to AWQ Int4

### System Info - CPU archtecture: x86_64 - CPU/Host memory size: 250GB total - GPU properties - GPU name: 2x NVIDIA A100 80GB - GPU memory size: 160GB total - Libraries - tensorrt @ fi…

aikitoria updated 4 months ago
3
tensorflow/tensorrt #187

Image Classification Example Int8 Quantization-Out of memory…

I'm trying to quantize TF-TRT INT 8 Model in Colab-TF-TRT-inference-from-Keras-saved-model.ipynb using Jupyter notebook. I faced gpu out of memory error. but i think i have enough gpu memory. ~~~ …

abc3698 updated 4 years ago
3
NVIDIA/TensorRT-LLM #1285

OOM when using quantize.py to quantize llama-like model

### System Info - GPU: 2xA100-40G - TensorRT-LLM v0.8.0 ### Who can help? @Tracin ### Information - [X] The official example scripts - [ ] My own modified scripts ### Tasks - [ ] An officia…

andakai updated 4 weeks ago
2
Dao-AILab/flash-attention #122

INT8 versions of FMHA and Flash-Attention (Forward)

Hi @tridao, we recently implemented INT8 forward FMHA (8-bit Flash-Attention) with both static and dynamic quantization for Softmax on our GPGPU card, and achieved good results and relatively okay acc…

jundaf2 updated 5 months ago
7
pytorch/ao #64

[New Feature] CUTLASS kernels for w4a8 quantization

We plan to add QAT for LLMs to torchao (as mentioned in the original RFC here https://github.com/pytorch-labs/ao/issues/47) For this to run efficiently on the GPU we'd need kernel support for W4A8…

supriyar updated 2 months ago
4
mlcommons/mobile_models #21

MobileBERT tflite int8 model seems not follow quantization s…

The model downloaded from https://github.com/fatihcakirs/mobile_models/blob/main/v0_7/tflite/mobilebert_int8_384_20200602.tflite Some Fully-connected weights has none-zero zero point (ex. weight `b…

rednoah91 updated 2 years ago
6
ChengpengChen/RepGhost #2

Hi~Great work there! Is INT8 quantization all good?

Hi~ Great work there! What I want to ask is whether RepGhost has suffered a serious loss after INT8 quantization? Or how do you solve quantitative problems? Thanks~

WenWeiZhao updated 1 year ago
1
breizhn/DTLN #66

Creating integer only models

Nils is it possible to create an integer only models so this could run on accelerators or frameworks such as ArmNN? https://www.tensorflow.org/lite/performance/post_training_quantization#full_integer…

StuartIanNaylor updated 7 months ago
11
phelps-matthew/FeatherMap #12

Quantization Aware Training + FeatherMap

If I wanted to use Quantization Aware Training (QAT) in conjunction with structured hashing, should I quantize **before** or **after** FeatherMap? i.e, (before, intuitively seems correct to me): …

varun19299 updated 3 years ago
8
triton-inference-server/tensorrtllm_backend #462

How to deploy one model instance across multiple GPUs to tac…

I am trying to deploy a Baichuan2-7B model on a machine with 2 Tesla V100 GPUs. Unfortunately each V100 has only 16GB memory. I have applied INT8 weight-only quantization, so the size of the engine I…

shil3754 updated 1 week ago
6

上一页 1...11 12 13 14 15 16 17...100 下一页

1000+ results for int8-quantization

1000+ results
for int8-quantization