-
### 🐛 Describe the bug
FSDP casts buffers for mixed precision, but uses `buf.data` to assign to, we should avoid the .data usage by de-registering the old buffer and re-registering the low precision …
-
Hello,
I am looking at lance for a pytorch dataloader. I am having issues with a lance based loader (like this one https://lancedb.github.io/lance/examples/llm_training.html) when using it in a di…
-
We were trying to finetune a Matformer checkpoint ( MatFormer-OLMo-180M [Link](https://drive.google.com/drive/folders/1hI8wlHzQYRLfC4XdnS5Xl1vwV8S2UA0f?usp=sharing ) )
We used the following comma…
-
It is a classical idea to overlap the backward pass and the optimization step. PyTorch supports this overlapping in DDP and FSDP. For example, here are hooks in DDP https://github.com/pytorch/pytorch/…
-
Hi all,
I wanted to try and add support for multi-gpu training to allow the fine-tuning of LLM. I've already [opened an issue](https://github.com/lxuechen/private-transformers/issues/31) a few week…
-
Setup
- Environment: Pytorch 2.3.0, composer 0.22.0, streaming 0.7.4
- GPU: 8xH100 sxm, BF16 mode
This issue is related #643 but concerns a more subtle issue with Streaming datasets. Over the cou…
-
### 🐛 Describe the bug
In multiprocessing mode (i.e. FSDP/DDP), there occur JSONDecodeErrors within torch._inductor.triton_heuristics.cached_autotune, if the filesystem does not lock the file itself.…
-
### 🐛 Describe the bug
Since PT 2, we have noticed significant amount of PCIe traffic between host and device, which is something we didn't expect to happen and not observed in PT 1.x version. This…
-
with @cbalioglu
**Context**
Communication/computation overlap is a well-known theme in data parallel training where developers exploit any independence in the forward/backward/optimizer passes …
-
## Proposed refactor
Flatten the Strategy inheritance:
Part of #10416
### Motivation
Reduce coupling between strategies, reduce unintentional overrides/inheritance and avoid silent failures
…