-
通过缩小context window和pos embedding大小,好像都没有用。是什么导致显存占用增加了,相比qwen第一版显存占用增加了很多
-
### bug描述 Describe the Bug
platform: windows11
paddlepaddle-gpu: 2.4.2
paddledet: 2.6.0
paddleslim: setup安装
example/auto_compression/detection/run.py报错如下:
ModuleNotFoundError: No module named …
-
Couldn't find any similar other issues in `accelerate`, `peft`, or `trl` so I'm opening one here. When using the DPOTrainer on a single GPU with QLoRA I have no issues, but when I try to run the scrip…
-
J'ai eu un flash en regardant ce modèle spécifique du [iGrow](https://www.greenhousemegastore.com/equip/controls-measuring-tools/environmental-controls/igrow-1400-greenhouse-controller)
Le circuit es…
-
For example,I have two RTX 3090 GPUs, and both the model and ref_model are 14 billion parameter models. I need to distribute these two models evenly across the two cards for training.
this is my code…
-
```
Traceback (most recent call last):aded
File "Sakura_DPO.py", line 318, in
fire.Fire(train)
File "/root/miniconda3/lib/python3.8/site-packages/fire/core.py", line 141, in Fire
com…
-
I got the following when running Dia-NN on a set of six Thermo raw files:
```
DIA-NN 1.7.12 (Data Independent Acquisition by Neural Networks)
Compiled on Oct 1 2020 21:36:38
Current date and ti…
-
Using vllm to infer the deepseek model encountered an error
```
[rank0]: self.mlp = DeepseekV2MoE(config=config, quant_config=quant_config)
[rank0]: File "/home/root/.local/lib/python3.10/s…
-
### System Info
`transformers` version: 4.32.0
- Platform: Linux-5.19.0-38-generic-x86_64-with-glibc2.35
- Python version: 3.10.9
- Huggingface_hub version: 0.16.4
- Safetensors version: 0.3.1
…
-
Hello!
I am testing the mistral-7b inference after quantization. I also want to test the impact of Flash Attention (sdpa, eager, fa2) on model inference. But the model decode latency is too high, and…