fsdp Search Results - Githubissues

1000+ results
for fsdp

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

pytorch/xla #8074

P2P send recv test gives errors

## 🐛 Bug Trying to test simple `xm.send` and `xm.recv` gives error. ## To Reproduce Steps to reproduce the behavior: 1. Run test code below ``` import torch import torch_xla.core.xl…

ajayvohra2005 updated 1 week ago
4
pytorch/pytorch #77155

FSDP: ability to ignore parameters

### 🚀 The feature, motivation and pitch https://github.com/pytorch/pytorch/issues/75255 implemented the ability to ignore FSDP parameters at the module level, i.e. by passing in an `ignore_module` li…

rohan-varma updated 1 year ago
5
pytorch/torchtune #1241

TiedEmbeddingTransformerDecoder: unit test and Gemma refacto…

After the addition of Qwen2, we have multiple models using a transformer decoder class with head weights tied to embedding weights (Gemma and Qwen2). The class `TiedEmbeddingTransformerDecoder` is int…

ebsmothers updated 2 months ago
4
pytorch/pytorch #104479

FSDP Optimizer Overlap - follow ups

### 🚀 The feature, motivation and pitch FSDP optimizer overlap is in https://github.com/pytorch/pytorch/pull/98667 needs some follow up work: - We reallocate the _cpu_grad for CPU offload every it…

rohan-varma updated 1 year ago
2
pytorch/pytorch #120737

Buffers that are completely ignored by FSDP

### 🚀 The feature, motivation and pitch For implementing things like Alibi, we need a tensor in our model that is the same on each rank, is small, and never changes. This is very hard to do in FSDP. …

dirkgr updated 5 months ago
4
pytorch/pytorch #108023

[FSDP] Ignored modules on meta device seem to be initialized…

### 🐛 Describe the bug When passing in a module as `ignored_modules`, should we also ensure FSDP does not initialize it via `to_empty` + `reset_parameters`? If `ignored_modules` contract is that FSDP…

rohan-varma updated 11 months ago
3
facebookresearch/fairscale #1155

How can I use torchrun + model parallelism + FSDP

the tutorial and document haven't mentioned that parts, I have tried to torchrun 4 threads load llama2 with model parallelism, but failed with fsdp togather.

HackGiter updated 8 months ago
1
pytorch/PiPPy #1105

FSDP+PP bug where reshard_after_forward must be true

https://github.com/pytorch/torchtitan/pull/161/files#diff-80b04fce2b861d9470c6160853441793678ca13904dae2a9b8b7145f29cd017aR269 IIRC @awgu mentioned there was an issue requiring this setting for…

wconstab updated 5 months ago
6
pytorch/pytorch #113045

[Feature][DTensor] Manage additional `_padded_local_tensor` …

Currently, the DTensor tensor subclass manages a `_local_tensor` attribute that represents the local tensor on the given rank. For efficient all-gather/reduce-scatter, we prefer to have a padded local…

awgu updated 7 months ago
1
pytorch/pytorch #116101

[DTensor] Support API to shard to parent mesh

**Context** To compose per-parameter-sharding FSDP with `DTensor`-based tensor parallelism, we need to reshard an existing `DTensor` to its parent mesh and include the FSDP dim-0 sharding. The cur…

awgu updated 9 months ago
3

上一页 1...29 30 31 32 33 34 35...100 下一页

1000+ results for fsdp

1000+ results
for fsdp