fsdp Search Results - Githubissues

1000+ results
for fsdp

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

pytorch/pytorch #114632

[DeviceMesh] back DeviceMesh initialization by custom_pg

### 🚀 The feature, motivation and pitch The ask is whether we can expose a way to construct a 1D device mesh from an existing process group. For context, this is to ensure interoperability of per-…

wz337 updated 9 months ago
6
Lightning-AI/pytorch-lightning #11863

Flatten the Strategy inheritance

## Proposed refactor Flatten the Strategy inheritance: Part of #10416 ### Motivation Reduce coupling between strategies, reduce unintentional overrides/inheritance and avoid silent failures …

four4fish updated 1 year ago
1
Stability-AI/StableCascade #71

How to set "use_fsdp=True" with "SLURM_LOCALID" and "SLURM_P…

For single GPU training, every time I run the script, I have to "export SLURM_LOCALID=0", "export SLURM_PROCID=0" and "export SLURM_NNODES=1" before I start the training successfully. My question is f…

terrificdm updated 4 months ago
7
speed1313/jax-llm #3

distributed training

speed1313 updated 4 months ago
2
modelscope/ms-swift #1939

streaming模式读取数据，显存利用率很低

我的训练集数据量很大，有上百万，直接读取训练会OOM，所以使用streaming模式读取数据，但是发现训练速度很慢。发现gpu的利用率很低 cpu直接被打满了训练参数 ``` SftArguments(train_type='sft', model_type='internvl2-8b', model_revision='master', full_deter…

guozhiyao updated 1 month ago
7
openvla/openvla #139

Question about fine-tuning on real robot advices

Thanks for your reply and advice in #105! I did more experiments. I am trying to fine-tune in a real ur3 robot. Some key informations: - trained on one task: Move the black box right (randomized …

Yikai1 updated 10 hours ago
4
pytorch/pytorch #113113

FSDP does not move modules without parameters to device

### 🐛 Describe the bug I discovered that FSDP doesn't call `.to()` on submodules that have no parameters. This seems odd, since other modules get moved automatically even if they are not wrapped ex…

awaelchli updated 10 months ago
6
pytorch/pytorch #113187

Fix docstring errors in embedding.py, _limiter_utils.py, _dy…

Please fix the following issues. First, make sure to install the required tools: ``` pip3 install pydocstyle ``` ``` pip3 install ruff ``` Then complete the followings steps: 1. Run `pydocst…

svekars updated 11 months ago
6
meta-llama/llama-recipes #711

Convert Llama-3.2-11B-Vision-Instruct FSDP Checkpoints to HF…

### System Info transformers: '4.45.1' ### Information - [ ] The official example scripts - [X] My own modified scripts ### 🐛 Describe the bug I have fine-tuned `Llama-3.2-11B-Vision-Instruct` fo…

marscod updated 1 week ago
4
pytorch/pytorch #74588

[FSDP] using CPUOffload creates 3-10x slowdown due to slow …

### 🐛 Describe the bug Create simple distributed model Wrapper model with FSDP. Using stateful optimizer ala Adam(W) run without CPUoffload and profile/time. Then run with CPUOffload and see th…

lessw2020 updated 1 year ago
5

上一页 1...48 49 50 51 52 53 54...100 下一页

1000+ results for fsdp

1000+ results
for fsdp