-
`Traceback (most recent call last):
File "C:\ai\stable-diffusion-webui\Kohya\kohya_ss\train_network.py", line 539, in
train(args)
File "C:\ai\stable-diffusion-webui\Kohya\kohya_ss\train_ne…
-
**Describe the bug**
再进行多机lora微调时出错:
failed (exitcode: -11) local_rank: 5 (pid: 11514) of binary: /home/jovyan/data-ws-enr/zconda/envs/swift_ft/bin/python
Traceback (most recent call last):
File…
-
Hi, can I check if this is a typo in the training script?
```
if ema_state_dict is not None:
checkpoint_path = f"{checkpoint_dir}/{int(train_steps/args.gradient_accumulation_steps):07d}_ema"
…
-
**Describe the bug**
deepspeed zero3 gets error in dist.get_rank() in multiple node and multiple gpu
it is perfectly fine when setting to stage 2
transformers: v.4.36.0
accelerate: v.0.26.0
dee…
-
dmarx updated
2 years ago
-
### 🐛 Describe the bug
I used fsdp+ShardedGradScaler to train my model. Compared with apex. amp+ddp, the precision of my model has decreased.
The ddp is like
```
model, optimizer = amp.initial…
-
re the notebook :✉️ MarketMail AI ✉️ Fine tuning BLOOMZ (Completed Version).ipynb
https://colab.research.google.com/drive/1ARmlaZZaKyAg6HTi57psFLPeh0hDRcPX?usp=sharing
i tried to modify the exa…
-
@huminghao16
Could you include the scripts for evaluating a pretrained model?
(for example evaluating the large model you have included in the readme.)
I am running this command:
```
export …
-
一下是我的参数
LR=6e-6
DATE=0704
EPOCH=2
MAX_LEN=1024
MASTER_PORT=8888
deepspeed --num_gpus=8 --master_port $MASTER_PORT main.py \
--deepspeed deepspeed.json \
--do_train \
--do_eval \
…
-
Forgive me if the answer is obvious, but I am using this pytorch implementation with my own data and am confused what the purpose is of a few lines of code in train_i3d.py are doing.
The optimizer …