fsdp Search Results - Githubissues

1000+ results
for fsdp

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

facebookresearch/fairscale #601

[FSDP] Can not resume the optimizer with params in mulitple …

## ❓ Questions and Help I tried to follow the tutorial to change my codes to use FSDP; however, I do not know how to resume the training properly. Every time I resume, it seems to restart from scra…

chunfuchen updated 3 years ago
4
mosaicml/composer #2503

Trainer.fit is missing the `optimizers` parameter

Hello, The `.fit()` method of the Trainer is [missing the `optimizers` parameters](https://github.com/mosaicml/composer/blob/4c5ba954e3007ce2af6eb3003efa9d76de38c959/composer/trainer/trainer.py#L1611…

antoinebrl updated 8 months ago
3
Luodian/Otter #249

About GPU memory?

Hi! I'm using two A100 GPUs, each with 40GB of memory. This is the GPU memory utilization for my training. I'm almost reaching over 90% memory utilization on both A100 GPUs. ![image](https://github.…

zuwenqiang updated 1 year ago
3
pytorch/pytorch #124133

ZeroRedundancyOptimizer + AdamW Fused can't load state_dict …

### 🐛 Describe the bug Keep getting this error. ``` Expected all tensors to be on the same device, but found at least two devices, cuda:7 and cpu! (when checking argument for argument state_steps…

PeterL1n updated 4 months ago
6
facebookresearch/fairscale #1083

Can't load optimizer state due to `state_steps`

Hi, I recently upgraded to PyTorch 1.12 and have had issues with loading a saved optimizer state using FSDP here and the issue seems something that is addressed in comments here - https://github.com/…

rowhanm updated 2 years ago
10
lm-sys/FastChat #2318

Error when trying to finetune `lmsys/vicuna-7b-v1.5` with 6 …

script used to finetune `lmsys/vicuna-7b-v1.5` ``` CUDA_VISIBLE_DEVICES="7,6,5,4,3,2" torchrun --nproc_per_node=4 --master_port=20001 fastchat/train/train_mem.py \ --model_name_or_path lmsys/…

kunqian-58 updated 10 months ago
1
pytorch/pytorch #95957

FSDP fails to load state dict under inference_mode

### 🐛 Describe the bug A runtime error occurs when attempting to load the state dict of an fsdp model under `torch.inference_mode()`: ```py import os import torch.cuda import torch.nn as nn…

awaelchli updated 1 year ago
3
keras-team/keras #18511

[pytorch backend] Integrate torch_xla for distribution APIs …

There's two use-cases for [`torch_xla`](https://github.com/pytorch/xla) for the pytorch backend in Keras, namely: 1. Implement the [distribution API](https://github.com/keras-team/keras/blob/048416…

kiukchung updated 1 year ago
3
pytorch/pytorch #125800

[BE][DTensor] get rid of @with_comms decorator for tests, an…

as titled, we should get rid of the with_comms decorator https://github.com/pytorch/pytorch/blob/main/torch/testing/_internal/distributed/_tensor/common_dtensor.py#L355 Instead, init and destroy th…

wanchaol updated 5 months ago
2
pytorch/pytorch #64661

[Efficiency] Support optimizer overlap with backward pass in…

Similar to DDP and ZeroRedundancyOptimizer, FSDP can support optimizer overlap with backward pass by calling functional optimizers cc @pietern @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-…

zhaojuanmao updated 2 years ago
1

上一页 1...51 52 53 54 55 56 57...100 下一页

1000+ results for fsdp

1000+ results
for fsdp