auto-quant Search Results

1000+ results
for auto-quant

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

vllm-project/vllm #4532

[RFC]: Refactor FP8 kv-cache

### Motivation. **Support float8_e4m3 for NVIDIA GPUs:** The current FP8 kv-cache supports e5m2 on NVIDIA GPUs, and e4m3 on AMD GPUs. While e5m2 seems to be an ideal format for kv-cache storage due…

comaniac updated 3 months ago
11
hamoid/video_export_processing #49

Export with alpha channel / transparency?

I wonder how could that be possible. I'd prefer to have an alpha channel rather than having to key the exported result. I'm sure there's some ffmpeg set of options for this :)

ubidefeo updated 2 years ago
18
microsoft/DeepSpeed #5617

[HELP] ZeRO3 partition parameters after fully load to each G…

**Describe the bug** I'm fine tuning Llama2 using deepspeed zero3. I found that parameters load to CPU memory during from_pretrained, and at the begining of trainer.train(), params will fully load to…

CHNRyan updated 4 months ago
7
huggingface/peft #1923

Gradient not appliable to 4-bit quantization. (sft-qlora-fsd…

### System Info accelerate 0.31.0 peft 0.11.1 transformers 4.42.4 bitsandbytes 0.41.1 ##### The following packages…

NotTheStallion updated 3 months ago
7
LoupHC/controleur-CAPE #90

Assemblage du boitier

J'ai eu un flash en regardant ce modèle spécifique du [iGrow](https://www.greenhousemegastore.com/equip/controls-measuring-tools/environmental-controls/igrow-1400-greenhouse-controller) Le circuit es…

LoupHC updated 5 years ago
27
pytorch/ao #208

FP6 dtype!

### 🚀 The feature, motivation and pitch https://arxiv.org/abs/2401.14112 I think you guys are really going to like this. The deepspeed developers introduce FP6 datatype on cards without fp8 suppo…

NicolasMejiaPetit updated 3 months ago
31
deepset-ai/haystack #6519

`HuggingFaceLocalGenerator` keeps generating after stopword

**Describe the bug** Although I set the `stopwords` parameter of `HuggingFaceLocalGenerator` to `["Original"]` it keeps on generating after this token was generated. The only effect of setting the st…

julian-risch updated 5 months ago
13
InternLM/lmdeploy #2660

[Bug] Core Dumped！使用lmdeploy==0.6.1版本在单卡P100上部署Internvl2-2B模…

### Checklist - [X] 1. I have searched related issues but cannot get the expected help. - [X] 2. The bug has not been fixed in the latest version. - [X] 3. Please note that if the bug-related issue y…

xuexidi updated 1 week ago
2
xorbitsai/inference #2089

internlm2.5-7B-chat& internlm2.5-7B-chat-1M can't run in vll…

### System Info / 系統信息 Cuda:12.5 python:3.9 ubuntu22.04 ### Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece？ - [ ] docker / docker - [X] pip install / 通过 pip install 安装 - [ ] instal…

soulzzz updated 2 months ago
9
NVIDIA/TensorRT-LLM #2149

How to add lora adapter to whisper models?

### System Info A100-PCIe-80GB TensorRT-LLM version: 0.13.0.dev2024082000 ubuntu 22.04 ### Who can help? @Tracin @n ### Information - [X] The official example scripts - [X] My own modified scri…

Jeevi10 updated 1 month ago
1

上一页 1...82 83 84 85 86 87 88...100 下一页

1000+ results for auto-quant

1000+ results
for auto-quant