-
I am using Huggingface Seq2SeqTrainer for training Flan-T5-xl model with deepspeed stage 3.
```
trainer = Seq2SeqTrainer(
#model_init = self.model_init,
model=se…
-
### 🐛 Describe the bug
Using MPS for BERT inference appears to produce about a 2x slowdown compared to the CPU. Here is code to reproduce the issue:
```python
# MPS Version
from transformers i…
-
### 🐛 Describe the bug
Currently, when using FSDP, the model is loaded for each of the N processes completely on CPU leading to huge CPU RAM usage. When training models like Flacon-40B with FSDP on…
-
### What happened?
Hi,
When I use llama.cpp to deploy a pruned llama3.1-8b model, a unbearable performance degration appears:
We useing a structed pruning method(LLM-Pruner) to prune llama3.1-8b, w…
-
[ollama-for-amd [v0.3.4]
OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES:]"
time=2024-08-09T00:25:59.140+08:00 level=INFO source=images.go:782 msg="total blobs: 5"
time=2024-08-09T00:25:59.141+08:00 level=INFO…
-
I'm currently trying out the ollama app on my iMac (i7/Vega64) and I can't seem to get it to use my GPU.
I have tried running it with num_gpu 1 but that generated the warnings below.
`
2023/11/…
-
The following models are taking longer when running on NNPA now compared to the 0.4.1 release.
* gpt2-10.onnx
* about 30% worse
* 0.4.1 - Total runMainGraph() time over all 100 infere…
-
I wonder will you support pipeline parallel in the future?If the answer is yes, maybe the whole system need to be designed again?
-
I set the environment variables as follow in train_dist.sh in gpt_hf folder:
```
export NUM_NODES=1
export NUM_GPUS_PER_NODE=8
export MASTER_ADDR=localhost
export MASTER_PORT=2222
export NODE_RA…
-
## 🐛 Bug
StableHLO performance currently seems to be 2 orders of magnitude worse than the normal XLA flow.
## To Reproduce
Please try the following script:
```python
import os
import timei…
SamKG updated
4 months ago