-
## 🐛 Bug
There seems to be a discrepancy (in addition to https://github.com/pytorch/xla/issues/3718) in how `torch.nn.Linear` (`torch.nn.functional.linear`) is implemented and dispatched between th…
-
-
Currently, we disabled Multi-GPU support for QLoRA because we didn't test it, yet. Might be worthwhile looking into this some time, so this issue is just to remember to revisit this.
rasbt updated
10 months ago
-
Currently, Unsloth can only support single GPU training, how can you implement it with 8-GPU training? Thx
-
When utilizing Axolotl, the training loss reduces to 0 following the gradient accumulation steps. Is this expected behaviour?
With Torchrun, the training loss consistently remains NaN.
Thank…
-
### 🐛 Describe the bug
from torch.distributed.fsdp import (
FullyShardedDataParallel as FSDP,
MixedPrecision,
BackwardPrefetch,
ShardingStrategy,
FullStateDictConfig,
St…
-
hello, when run main_finetune.py till 238th row:
for param in fsdp_ignored_parameters:
dist.broadcast(param.data, src=dist.get_global_rank(fs_init.get_data_parallel_group(), 0),
…
-
### 🚀 The feature, motivation and pitch
# 🚀 Feature
Provide a detailed API design for high-level PyTorch Tensor Parallelism API design. This is an evolvement of PyTorch Sharding introduced in ht…
-
When using:
```
torchrun --nproc_per_node=2 --master_port=20001 fastchat/train/train.py \
--model_name_or_path lmsys/vicuna-7b-v1.5 \
--data_path data/dummy_conversation.json \
--bf…
-
I would greatly appreciate your help with this error. Here is [tutorial] (https://huggingface.co/blog/fine-tune-whisper) that i followed. Thanks in advance.
''from transformers import Seq2SeqTrai…