-
https://www.answer.ai/posts/2024-04-26-fsdp-qdora-llama3.html
That looks awesome!
-
In awsome-distributed-training/3.test_cases/10.FSDP, when running `sbatch 1.distributed-training.sbatch` ( [1.distributed-training.sbatch](https://github.com/aws-samples/awsome-distributed-training/bl…
-
Hi,
my env as belows :
docker image : docker run --gpus all -it --net=host --ipc=host --ulimit memlock=-1 -v /home/ubuntu/test:/home/finetune -v /ssd/gyou:/models --name=vicuna nvcr.io/nvidia/pytor…
-
### 🐛 Describe the bug
It uses _allgather_base, but there is no support for this in Gloo backend:
```
RuntimeError: no support for _allgather_base in Gloo process group
```
### Versions
main
…
-
### 🐛 Describe the bug
I was trying to use CUDA graphs (`torch.cuda.make_graphed_callables`) on a model wrapped with `FullyShardedDataParallel` (FSDP) and I got the following error:
```
-- Process …
-
### Feature request
Like the trainer arguments data class https://github.com/huggingface/transformers/blob/2a002d073a337051bdc3fbdc95ff1bc0399ae2bb/src/transformers/training_args.py#L167
Its goo…
-
PEFT finetuning (LoRA, adapter) raises the following warning for each FSDP-wrapped layer (transformer block in our case):
```python
The following parameters have requires_grad=True:
['transformer…
-
## 🐛 Bug
We need to add regression benchmarks for the FSDP API and possible input combinations. These regression benchmarks should be added to [fairscale/benchmarks](https://github.com/facebookrese…
anj-s updated
3 years ago
-
xlarun: command not found, I used the container you provided, but the command is not found.
-
Hi,
I am trying to launch `dinov2/train/train.py` script directly without the Slurm scheduler. I use the following command to launch the training:
```
export CUDA_VISIBLE_DEVICES=0,1 && python dino…