-
https://github.com/pytorch/torchtitan/pull/161/files#diff-80b04fce2b861d9470c6160853441793678ca13904dae2a9b8b7145f29cd017aR254
In principle, the issue is that the PP model code traced the non-F…
-
### 🚀 The feature, motivation and pitch
_sharded_param_data is sitll on meta while sharded_param moved to cuda after calling initialize_parameters()
the workaround is model = model.to("cuda"). b…
-
-
A recent contribution to the pytorch_xla repo allows using FSDP in PyTorch XLA for sharding Module parameters across data-parallel workers. https://github.com/pytorch/xla/pull/3431
Some motivation be…
-
#### What is your question?
I used the python method in the Funasr documentation for exporting ONNX models to try to export the ONNX of the pretrained model paraex-en-Streaming, but kept getting er…
-
i get the below error when i run training cell in colab FineTuning_colab.ipynb
also run cell Training parameters and all parameter parsed
No LSB modules are available.
Description: Ubuntu 20.04.…
-
[torch-neuronx] FSDP support - Distributed Training on Trn1
-
### Root Cause
The root cause is due to recent transformers update [to resolve high CPU usage for large quantized models](https://github.com/huggingface/transformers/pull/33154).
- what the PR doe…
-
## 🚀 Feature
FSDP to offer the possibility to flatten parameters by group, for instance, to flatten all biases separately from the other weights.
## Motivation
Following issue https://github.…
-
### 🚀 The feature, motivation and pitch
I'm currently optimizing the [Lightning reference implementation of LLaMA](https://github.com/Lightning-AI/lit-llama) (7B), although the following will be gene…