-
### Motivation.
**Support float8_e4m3 for NVIDIA GPUs:** The current FP8 kv-cache supports e5m2 on NVIDIA GPUs, and e4m3 on AMD GPUs. While e5m2 seems to be an ideal format for kv-cache storage due…
-
I wonder how could that be possible.
I'd prefer to have an alpha channel rather than having to key the exported result.
I'm sure there's some ffmpeg set of options for this :)
-
**Describe the bug**
I'm fine tuning Llama2 using deepspeed zero3. I found that parameters load to CPU memory during from_pretrained, and at the begining of trainer.train(), params will fully load to…
-
### System Info
accelerate 0.31.0
peft 0.11.1
transformers 4.42.4
bitsandbytes 0.41.1
##### The following packages…
-
J'ai eu un flash en regardant ce modèle spécifique du [iGrow](https://www.greenhousemegastore.com/equip/controls-measuring-tools/environmental-controls/igrow-1400-greenhouse-controller)
Le circuit es…
-
### 🚀 The feature, motivation and pitch
https://arxiv.org/abs/2401.14112
I think you guys are really going to like this.
The deepspeed developers introduce FP6 datatype on cards without fp8 suppo…
-
**Describe the bug**
Although I set the `stopwords` parameter of `HuggingFaceLocalGenerator` to `["Original"]` it keeps on generating after this token was generated. The only effect of setting the st…
-
### Checklist
- [X] 1. I have searched related issues but cannot get the expected help.
- [X] 2. The bug has not been fixed in the latest version.
- [X] 3. Please note that if the bug-related issue y…
-
### System Info / 系統信息
Cuda:12.5
python:3.9
ubuntu22.04
### Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece?
- [ ] docker / docker
- [X] pip install / 通过 pip install 安装
- [ ] instal…
-
### System Info
A100-PCIe-80GB
TensorRT-LLM version: 0.13.0.dev2024082000
ubuntu 22.04
### Who can help?
@Tracin @n
### Information
- [X] The official example scripts
- [X] My own modified scri…