fsdp Search Results - Githubissues

1000+ results
for fsdp

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

pytorch/ao #704

Question: How to use Float8InferenceLinear with FSDP1/2?

Hey Team, I'm trying to use FSDP1/2 with Float8InferenceLinear but seems have some issues (with torch 2.3.1+cu118). Do you suggestion to bump to higher version of torch and have a try or maybe use …

qingquansong updated 1 month ago
15
modelscope/FunASR #1852

Funasr导出预训练模型的ONNX出现问题！

# 环境描述 ```bash 系统：Ubuntu18.04.6LTS ``` # 描述问题我使用Funasr文档中导出ONNX模型的python方法尝试导出paraformer-zh-streaming这个预训练模型的ONNX，但一直出现错误！ ```bash (funasr_env) lipeng@lipeng:~/share/modules$ vim export_ON…

LP-world2002 updated 1 month ago
4
pytorch/xla #7832

80B model how to shard restore in spmd training

## ❓ Questions and Help In pytorch we can use `fsdp meta init` shard restore my big model(like have 80B parameters),in torch_xla i only find shard save like use this.https://github.com/pytorch/xla/bl…

mars1248 updated 2 months ago
2
facebookresearch/metaseq #567

Error when run reshard_fsdp on opt-IML 30b for inference

Traceback (most recent call last): File "/opt/conda/envs/alpa/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/opt/conda/envs/alpa…

zhanghaoie updated 1 year ago
12
facebookresearch/optimizers #19

Fails from DeepSpeed

Using the latest main to train a YoloV9e object detector: ``` [rank0]: train_one_epoch(train_loader, model, args, model_dtype) [rank0]: File "/mnt/dingus_drive/catid/train_detector/train.py…

catid-saronic updated 3 weeks ago
1
pytorch/pytorch.github.io #1380

Wrong Code in the FSDP blog

## 📚 Documentation In the blog introducing [FSDP API](https://pytorch.org/blog/introducing-pytorch-fully-sharded-data-parallel-api/) ```python fsdp_model = FullyShardedDataParallel( model()…

Luosuu updated 11 months ago
1
facebookresearch/dinov2 #314

loss become Nan without Fsdp

I delete the code “model.prepare_for_distributed_training() ” in dinov2/train.train.py then my loss become Nan after I train this model after only 1 iter. I don't know why， I just changed an o…

Oopslulu updated 10 months ago
1
TUDB-Labs/MoE-PEFT #3

Support for Multi-GPU Training

Great work! But I've noticed that the current implementation seems to only support single-GPU training. Is that correct? If so, do you have any plans to extend support for multi-GPU training in the fu…

kekethu updated 3 weeks ago
1
philschmid/llm-sagemaker-sample #16

Fine Tuning Mixtral 8x7b

Hi there, Thanks for the scripts and posts! I am interested in fine-tuning Mixtral 8x7b on sagemaker. The task I have requires around 8k token length. I have tried running training following th…

BedirT updated 5 months ago
4
easybuilders/easybuild-easyconfigs #16733

too many test failures for PyTorch/1.12.0-foss-2022a-CUDA-11…

On both our V100 (Intel Cascade Lake) and A100 (AMD Milan) systems (both RHEL 8.4 currently), I'm seeing too many test failures for `PyTorch/1.12.0-foss-2022a-CUDA-11.7.0`. On both systems, I get `…

boegel updated 1 year ago
5

上一页 1...17 18 19 20 21 22 23...100 下一页

1000+ results for fsdp

1000+ results
for fsdp