-
### 🐛 Describe the bug
I am using NaNDetect in the llama2.c project to track down NaNs during training.
https://github.com/karpathy/llama2.c
I works when device='cpu' but gets the exception whe…
-
### 🐛 Describe the bug
Similar to [this issue](https://github.com/pytorch/pytorch/issues/107084) (which is for `nn.MultiheadAttention`) and [this comment](https://github.com/pytorch/pytorch/issues/10…
-
### Is there an existing issue for this?
- [X] I have searched the existing issues and checked the recent builds/commits
### What happened?
I've already started and the error shows up for me.…
-
### 🐛 Describe the bug
The [documentation](https://pytorch.org/docs/stable/notes/extending.html#extending-torch-with-a-tensor-like-type) indicates that `torch` can be extended with `torch.Tensor`-lik…
-
### Bug description
Hi,
I am currently testing with IterableDataset and DDP.
Total Examples - ```10000```
Batch_size - ```32```
NUM_GPUS - ```2``` .
While using IterableDataset , ideally w…
-
### Issue and Suggested Fix
Please can this helpful tutorial be updated with the HF `from datasets import load_dataset` and merged into main with the dependency issue workaround:
```
import tor…
-
### 🐛 Describe the bug
The TransformerDecoder module gives different outputs at validation time depending on whether gradient calculations are being performed.
Minimal example:
```
import …
-
### Describe the bug
The Dreambooth Colab notebook fails at the training stage. Seems to be an issue with bitsandbytes.
### Reproduction
Run the Dreambooth Colab notebook. It fails at training.
…
-
Hi
I'm trying to do a distributed training on llama-7b in a VM having two Tesla T4 GPU's using native deepspeed. I'm facing the following error "RuntimeError: Expected all tensors to be on the same d…
-
### steps taken
```
%pip uninstall torch transformers datasets bitsandbytes trl perft flash-attn -y
%pip install -Uq transformers[torch] datasets
%pip install -Uq bitsandbytes trl peft
%pip insta…