-
Hello,
Along the issue here https://github.com/evo-design/evo/issues/11 which discusses finetuning codes for Evo, I am specifically looking for information on which frameworks could be used to opti…
-
Support whole model activation offloading with FSDP - working in conjunction with activation checkpointing - via
https://github.com/pytorch/pytorch/blob/e9ebda29d87ce0916ab08c06ab26fd3766a870e5/to…
-
### Please check that this issue hasn't been reported before.
- [X] I searched previous [Bug Reports](https://github.com/axolotl-ai-cloud/axolotl/labels/bug) didn't find any similar reports.
###…
-
The old codepath is not composable with other transforms, does not offer gathering of state dicts as easily etc.
Removing, of course depends on NVIDIA benchmarking not needing it. I think we (@crc…
-
FSDP2 supports all-gather using FP8:
https://discuss.pytorch.org/t/distributed-w-torchtitan-enabling-float8-all-gather-in-fsdp2/209323
Wondering if we could do this directly using TransformerEngine …
-
### System Info
```Shell
'Accelerate version: 0.31.0
Platform: Linux-5.4.0-1131-aws-fips-x86_64-with-glibc2.35
'accelerate bash location: /databricks/python3/bin/accelerate
Python version: 3…
-
### System Info
- `transformers` version: 4.44.0
- Platform: Linux-5.4.0-162-generic-x86_64-with-glibc2.31
- Python version: 3.11.9
- Huggingface_hub version: 0.23.4
- Safetensors version: 0.4.…
-
While trying out INT8 mixed precision pretraining (#748) with torchtitan, I came across an issue that if the model is FSDP-sharded, `quantize_()` won't work. The fix would be adding an extra logic to …
-
Hello,
May I know if the current FSDP and DeepSpeech are stable and available for use? Do they support multi-machine multi-card and LORA fine-tuning?
-
**Is your feature request related to a problem? Please describe.**
I'm interested in hybrid FSDP where the model is replicated across nodes and sharded within node.
My understanding is that this c…