fsdp Search Results - Githubissues

1000+ results
for fsdp

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

yaojin17/Unlearning_LLM #8

Cannot import name 'UnencryptedCookieSessionFactoryConfig' f…

**Could the author help me to solve this?** Traceback (most recent call last): File "~/miniconda3/envs/unlearning_llm/lib/python3.12/site-packages/transformers/utils/import_utils.py", line 1764,…

IMoonKeyBoy updated 1 week ago
6
NVIDIA/nccl #1160

NCCL hang after before training loop

Hi there, I try to run this test (https://github.com/pytorch/examples/tree/main/distributed/FSDP) to check if my cuda and GPU works fine. I disabled both ACS and IOMMU. But the process always hang be…

TeddLi updated 10 months ago
3
pytorch/pytorch #138715

inductor error with PT Lightning + FSDP + torchao.float8 + t…

### 🐛 Describe the bug I see the following error in a toy training loop with PyTorch Lightning, FSDP1, torchao.float8 and torch.compile: ``` [rank0]: File "/home/vasiliy/.conda/envs/pt_nightly_…

vkuzo updated 1 week ago
8
facebookresearch/fairscale #660

[checkpoint]: error when convert sync bn is done after check…

@prigoyal found a bug when checkpoint is done before sync BN conversion, it fails with: ``` torch.nn.modules.module.ModuleAttributeError: 'SyncBatchNorm' object has no attribute '_checkpoint_fwd_c…

min-xu-ai updated 3 years ago
5
Luodian/Otter #256

Load pretrained weight error

running script: ```sh export PYTHONPATH=. accelerate launch --config_file=./pipeline/accelerate_configs/accelerate_config_fsdp.yaml \ ./pipeline/train/instruction_following.py \ --pretrained_mode…

baibizhe updated 8 months ago
13
ovh/ai-training-examples #102

AttributeError: 'NoneType' object has no attribute 'cquantiz…

While running model, tokenizer = load_model(model_name, bnb_config) I am getting the following error, --------------------------------------------------------------------------- AttributeErro…

AdarshGowda33 updated 8 months ago
1
Fannovel16/comfyui_controlnet_aux #250

Error in importing dwpose.py

Hi, I'm encountering problem when trying to import dwpose.py. It looks like the problem is cuda version or torch version, I'm wondering that is there specific requirements for cuda or pytorch version?…

amnesicloud updated 9 months ago
1
OpenMOSS/MOSS #272

ZeRORuntimeException: You are using ZeRO-Offload with a clie…

显卡配置：2张 V100 32G （共四张，有两张别人占用中，用完后可实现利用4卡V100）按照默认accelerate配置报错：cuda out of memory，观察发现默认配置中 offload_optimizer_device 和 offload_param_device 参数均为none，后按照accelerate教程，将这两个参数均改成 cpu 报错： ![image](h…

Daniel-1997 updated 1 year ago
3
huggingface/diffusers #6705

accelerate + FSDP + T2I train saving ckpt error

### Describe the bug I have used /examples/text_to_image/train_text_to_image_sdxl.py to train a fine tune sdxl. I used accelerate 0.25.0 + FSDP, when I was saving a checkpoint it will stuck and can'…

Forainest updated 2 days ago
9
GanjinZero/RRHF #28

How to use it. Is there some code examples?

How to use it. Is there some code examples?

Mr-IT007 updated 1 year ago
1

上一页 1...92 93 94 95 96 97 98...100 下一页

1000+ results for fsdp

1000+ results
for fsdp