-
## ❓ Questions and Help
### Before asking:
1. search the issues.
2. search the docs.
#### What is your question?
I change the `ddp-backend` from `no_c10d` to `fully_sharded` as it is said…
-
From @SameerDalal 's PR #278 - moving this over to decouple discussion from code changes.
> Hi Everyone!
>
> Spoke with Prof. Chen today and the next feature we would like to develop is the ELO r…
-
### 🐛 Describe the bug
I was wondering anyone having experiences with full parameter fine tuning of Llama 2 7B model using FSDP can help: I put in all kinds of seeding possible to make training deter…
-
### 请提出你的问题 Please ask your question
使用PaddleNLP进行多分类训练时,使用paddle.distributed.launch进行多核并发加快进度,使用命令参数如下:
```
python3 -m paddle.distributed.launch --nproc_per_node=16 train.py \
--do_train \
…
-
Training for 50 epochs on CIFAR-10 with
```
OMP_NUM_THREADS=1 python -m torch.distributed.launch --nproc_per_node=1 train.py --num_workers 4 --batch_size 128 --epochs 50
```
and then boosting with…
-
### Description
A recent paper on near-term, quantum-enhanced machine learning (of which I am an author) studied a theoretical bottleneck to using quantum kernels (similarity measures) in practice f…
-
### Please check that this issue hasn't been reported before.
- [X] I searched previous [Bug Reports](https://github.com/OpenAccess-AI-Collective/axolotl/labels/bug) didn't find any similar reports.
…
DMR92 updated
2 months ago
-
### Describe the bug
By default, simply adding "report_to": "wandb" as an argument for training_args (for HF Trainer) only creates plots (say, for GPU usage) for only the master node on the wan…
-
How to reproduce the results of the coco dataset
When I first trained 70 classes,the“e2e_faster_rcnn_R_50_C4_1x.yaml”:
`MODEL:
META_ARCHITECTURE: "GeneralizedRCNN"
WEIGHT: "catalog://ImageNetP…
-
Here is the stacktrace of `run_pretrain_bart.sh` error:
```
[rank0]: IndexError: Caught IndexError in DataLoader worker process 0.
[rank0]: Original Traceback (most recent call last):
[rank0]: F…