quantizing Search Results

1000+ results
for quantizing

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

NVIDIA/TensorRT-LLM #1470

[Feature Request] llama v3 support

### System Info llama3 released https://huggingface.co/collections/meta-llama/meta-llama-3-66214712577ca38149ebb2b6 https://github.com/meta-llama/llama3 ### Who can help? @ncomly-nvidia ### …

gulldan updated 4 months ago
32
ultralytics/ultralytics #4097

How to effectively quantize Yolov8 model to int8 ?

### Search before asking - [X] I have searched the YOLOv8 [issues](https://github.com/ultralytics/ultralytics/issues) and [discussions](https://github.com/ultralytics/ultralytics/discussions) and f…

jerin-scalers-ai updated 2 months ago
19
mratsim/Arraymancer #616

2023-12-31 - Longstanding missing features

Arraymancer has become a key piece of Nim ecosystem. Unfortunately I do not have the time to develop it further for several reasons: - family, birth of family member, death of hobby time. - competin…

mratsim updated 5 months ago
25
BAAI-DCAI/Bunny #43

Expected all tensors to be on the same device, but found at …

Dear, I'm quite struggling to make sample code works on my laptop with a Nvidia A2000(8GB) card. Does anyone has an advice? RuntimeError: Expected all tensors to be on the same device, but …

albert-haam updated 4 months ago
4
ultralytics/yolov5 #12501

Quantizing Yolov5 - torch.save() and additional error

### Search before asking - [X] I have searched the YOLOv5 [issues](https://github.com/ultralytics/yolov5/issues) and [discussions](https://github.com/ultralytics/yolov5/discussions) and found no simi…

TCGoingW updated 6 months ago
6
ggerganov/llama.cpp #4199

KeyError: 'I8' when trying to convert finetuned 8bit model t…

# Prerequisites Hi there, I am finetuning the model `https://huggingface.co/jphme/em_german_7b_v01` using own data (I just replaced the questions and answers by dots to keep it short and simple). …

Lue-C updated 4 months ago
10
ggerganov/llama.cpp #7001

Current state Llama3 & Mixtral 8x22b conversion

If I got it right, we should convert Llama3 with "convert-hf-to-gguf.py". This uses a ton of memory and my Mac Studio M1 Ultra with 128GB VRAM is unable to convet Llama3-70b to f32. Luckily it worked …

PawelSzpyt updated 4 months ago
2
openvinotoolkit/nncf #2743

Very inconsistent quantization results on Tensorflow

### 🐛 Describe the bug I'm trying to use NNCF for a rec sys model to quantize it to int8. Before using it on our production model, I wanted to get it working on a simple toy example first but am seei…

apoorvu-sharechat updated 2 months ago
10
vllm-project/vllm #4025

[Feature]: Support for 4-bit KV Cache in paged-attention op

### 🚀 The feature, motivation and pitch # Summary We would like to support the 4-bit KV cache for the decoding phase. The purpose of this feature is to reduce the GPU memory usage of the KV cache wh…

yukavio updated 4 months ago
5
ultralytics/ultralytics #8587

1.58 bit quantization

### Search before asking - [X] I have searched the YOLOv8 [issues](https://github.com/ultralytics/ultralytics/issues) and found no similar feature requests. ### Description 1.58 bit quantization i…

joelwebb updated 4 months ago
7

上一页 1...92 93 94 95 96 97 98...100 下一页

1000+ results for quantizing

1000+ results
for quantizing