fsdp Search Results - Githubissues

1000+ results
for fsdp

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

kubeflow/training-operator #2040

Add more AI/ML Training Examples

As we discussed previously: https://github.com/kubeflow/training-operator/pull/2021#issuecomment-1987733922 we want to add more AI/ML examples to the Kubeflow Training Operator. Right now, most of our…

andreyvelich updated 18 hours ago
7
pytorch/pytorch #109774

[DDP + Dynamo] Tracing DDP AllReduce (Compiled DDP)

### 🚀 The feature, motivation and pitch **Background** DistributedDataParallel (DDP) uses `Reducer` to bucket and issue `allreduce` calls. The main entry point of `Reducer` is through the gradient …

fegin updated 3 months ago
9
axolotl-ai-cloud/axolotl #1225

Wrong eval stages distribution when using `evals_per_epoch` …

### Please check that this issue hasn't been reported before. - [X] I searched previous [Bug Reports](https://github.com/OpenAccess-AI-Collective/axolotl/labels/bug) didn't find any similar reports. …

sadaisystems updated 1 month ago
1
axolotl-ai-cloud/axolotl #1244

DeepSpeed: `Error invalid configuration argument at line 216…

### Please check that this issue hasn't been reported before. - [X] I searched previous [Bug Reports](https://github.com/OpenAccess-AI-Collective/axolotl/labels/bug) didn't find any similar reports…

danielchalef updated 3 weeks ago
9
tatsu-lab/stanford_alpaca #306

NotImplementedError: offload_to_cpu=True and NO_SHARD is not…

I am using a single GPU(A10) to run Bloom-560m model fine-tune, error, how to solve? I found similar problems in other projects, but I didn't know how to solve the problems in alpaca https://github.c…

mechigonft updated 10 months ago
1
pytorch/pytorch #107909

Provide a `reset_parameters()` method for MultiheadAttention…

### 🚀 The feature, motivation and pitch The [MultiheadAttention](https://github.com/pytorch/pytorch/blob/2fbe6ef2f866fe6ce42a950f2053f2f6b4bdab90/torch/nn/modules/activation.py) layer has a protected…

awaelchli updated 1 year ago
3
Azure/MS-AMP #123

V0.4 Release Plan

# Release Manager @cp5555 # Endgame - [x] Code freeze: Feb. 9th, 2024 - [x] Bug Bash date: Feb. 12th, 2024 - [x] Release date: Feb. 23rd, 2024 # Main Features ## MS-AMP O3 Optimization -…

cp5555 updated 2 months ago
1
Lightning-AI/pytorch-lightning #18028

2x slower training speed with FSDP when switching from light…

### Bug description Hello! Thank you for the integration of fsdp in the lightning trainer - it's a game changer. I tried to switch from `lightning==1.9.4` to the newest `lightning==2.0.4` but obs…

anthonyhu updated 8 months ago
4
facebookresearch/fairscale #430

[FSDP] Can we access parameter views when using flatten_para…

## ❓ Questions and Help This should explain the case: ```python import torch from fairscale.nn.data_parallel import FullyShardedDataParallel import os os.environ['MASTER_ADDR'] = 'localhos…

SeanNaren updated 2 years ago
9
facebookresearch/fairscale #584

FSDP: add a way to sync params from a given rank

For DDP, rank 0's weight is synced to all ranks before forward. For FSDP, it would be nice to have a way to do this so that different weights from different ranks will be consistent before the forward…

min-xu-ai updated 2 years ago
3

上一页 1...50 51 52 53 54 55 56...100 下一页

1000+ results for fsdp

1000+ results
for fsdp