-
**Could the author help me to solve this?**
Traceback (most recent call last):
File "~/miniconda3/envs/unlearning_llm/lib/python3.12/site-packages/transformers/utils/import_utils.py", line 1764,…
-
Hi there, I try to run this test (https://github.com/pytorch/examples/tree/main/distributed/FSDP) to check if my cuda and GPU works fine. I disabled both ACS and IOMMU.
But the process always hang be…
-
### 🐛 Describe the bug
I see the following error in a toy training loop with PyTorch Lightning, FSDP1, torchao.float8 and torch.compile:
```
[rank0]: File "/home/vasiliy/.conda/envs/pt_nightly_…
-
@prigoyal found a bug when checkpoint is done before sync BN conversion, it fails with:
```
torch.nn.modules.module.ModuleAttributeError: 'SyncBatchNorm' object has no attribute '_checkpoint_fwd_c…
-
running script:
```sh
export PYTHONPATH=.
accelerate launch --config_file=./pipeline/accelerate_configs/accelerate_config_fsdp.yaml \
./pipeline/train/instruction_following.py \
--pretrained_mode…
-
While running
model, tokenizer = load_model(model_name, bnb_config)
I am getting the following error,
---------------------------------------------------------------------------
AttributeErro…
-
Hi, I'm encountering problem when trying to import dwpose.py. It looks like the problem is cuda version or torch version, I'm wondering that is there specific requirements for cuda or pytorch version?…
-
显卡配置:2张 V100 32G (共四张,有两张别人占用中,用完后可实现利用4卡V100)
按照默认accelerate配置报错:cuda out of memory,观察发现默认配置中 offload_optimizer_device 和 offload_param_device 参数均为none,后按照accelerate教程,将这两个参数均改成 cpu 报错:
![image](h…
-
### Describe the bug
I have used /examples/text_to_image/train_text_to_image_sdxl.py to train a fine tune sdxl. I used accelerate 0.25.0 + FSDP, when I was saving a checkpoint it will stuck and can'…
-
How to use it. Is there some code examples?