-
Traceback (most recent call last):
File "/mnt/amj/LMFlow/examples/finetune.py", line 69, in
main()
File "/mnt/amj/LMFlow/examples/finetune.py", line 65, in main
tuned_model = finetune…
-
### 🐛 Describe the bug
[E ProcessGroupNCCL.cpp:737] [Rank 0] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=3424, OpType=ALLGATHER, Timeout(ms)=1800000) ran for 1803741 milliseconds be…
-
## Task
OPT 1.3B inference on Wikitext2 using E4M3 on Trainium `Trn1`
## Inference Script
Full script is attached
[script.zip](https://github.com/aws-neuron/aws-neuron-sdk/files/12016994/scri…
-
Hello guys !
First of all, thank you to the team of Facebook for this amazing tool.
I am using detectron2 for a lot of projects now and I have never faced a problem like this before.
I have a Co…
-
### System Info
- `transformers` version: 4.30.0.dev0
- Platform: Linux-5.4.204-ql-generic-12.0-19-x86_64-with-glibc2.17
- Python version: 3.8.12
- Huggingface_hub version: 0.15.1
- Safetensors v…
-
### Describe the bug
```
Traceback (most recent call last):
File "train_with_wav2vec2.py", line 374, in
valid_loader_kwargs=hparams["dataloader_opts"],
File "/home/zhengbeida/anaconda3…
-
----------------Environment: the same as :https://huggingface.co/edbeeching/gpt-neo-125M-imdb-lora
Transformers 4.27.0.dev0
Pytorch 1.13.1+cuda116
Datasets 2.9.0
Tokenizers 0.13.2
trl …
-
**Describe the bug**
I am using DeepSpeed to inference the facebook/OPT-30B float16 model on 1 A100 (40GB), with batch size of 1. Without using offload, there will be OOM error as expected. However, …
-
Hi, thanks for this nice repo!
I'm always facing kill signal and error code -11 after executing example/finetune.py with my own 10k dataset (text_only) on a single A100 40GB, CPU RAM 85GB server, s…
-
### Search before asking
- [X] I have searched the YOLOv8 [issues](https://github.com/ultralytics/ultralytics/issues) and [discussions](https://github.com/ultralytics/ultralytics/discussions) and fou…