-
```
[rank0]: Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
[rank0]: Traceback (most…
-
在使用Lora微调模型时,在执行Trainer中反向传播函数loss.backward()时会报错"element 0 of tensors does not require grad and does not have a grad_fn",查询相关教程有说在该该句前添加loss.requires_grad(True),这样做后确实不再报错,但参数也不再更新,请问有什么解决方案吗?
-
### System Info
```Shell
Latest main version, torch nightly, cuda 12.6
```
### Information
- [ ] The official example scripts
- [ ] My own modified scripts
### Tasks
- [ ] One of t…
-
[Llama-3論文](https://arxiv.org/abs/2407.21783)の3.3.3 Collective Communication、3.3.4 Reliability and Operational Challenges における、NCCLXに類似する機能を作りたいモチベーション
-
Hi all, first of all, thanks for your great work!
I have issue when trying to use the optimizer with FSDP training.
The error is
` optimizer = DistributedShampoo(
File "/root/slurm/src/opti…
-
## 🐛 Bug
In our internal tests, the new `xm.all_gather` API implemented in https://github.com/pytorch/xla/pull/3275 is shown to take significantly more memory to execute than the previous all-gathe…
-
### Please check that this issue hasn't been reported before.
- [X] I searched previous [Bug Reports](https://github.com/axolotl-ai-cloud/axolotl/labels/bug) didn't find any similar reports.
###…
-
Tracker issue for adding [LayerSkip](https://arxiv.org/abs/2404.16710) to AO.
This is a training and inference optimization that is similar to layer-wise pruning. It's particularly interesting for…
jcaip updated
3 months ago
-
Hello,
May I know if the current FSDP and DeepSpeech are stable and available for use? Do they support multi-machine multi-card and LORA fine-tuning?
-
## ❓ Questions and Help
Hi, I recieved loss None when training model. Anyone can help?
Simple reproduct kaggle notebook [link](https://www.kaggle.com/code/liondude/notebook548442067d)
```
im…