-
I am training [glm-10b-chinese](https://huggingface.co/THUDM/glm-10b-chinese/blob/main/config.json) for step-1.
In theory, 10b paramters, fp32, total memory occupied should be :
* params : 40GB
* …
-
### System Info
```Shell
accelerate 0.20.3
python 3.10
numpy 1.24.3
torch 2.0.1
accelerate config:
compute_environment: LOCAL_MACHINE
deepspeed_config:
deepspeed_multinode_launcher: stand…
-
你好,使用命令进行单机多卡训练的时候报错。命令如下:
CUDA_VISIBLE_DEVICES=0,1,2,3 python src/train_bash.py \
--stage sft \
--model_name_or_path path_to_your_model \
--do_train \
--dataset alpaca_gpt4_zh \
…
-
Thanks for your great job. In your paper, the batch size is 16 in the tunning, how to set the batchsize as 16, change per_device_train_batch_size value from default 1 as 16?
-
我在用lora微调的时候,发现它保存的目录下有一个文件 mp_rank_00_model_states.pt,有32GB,还有一个bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt,有900M。有点困惑,lora应该只保存它训练的那部分参数才对。
我的finetune_lora.sh如下:
```shell
#!/bin/bash
expo…
-
```
Traceback (most recent call last):
File "/workspace/kohya_ss/sd-scripts/train_db.py", line 529, in
train(args)
File "/workspace/kohya_ss/sd-scripts/train_db.py", line 190, in train
…
-
I think #357 should be applied to the pretrain script as well.
Thank you so much lightning team for this amazing repository.
-
when I use the LongAlpaca-12k dataset to supervised fintune the LongAlpaca-7B model, the value of loss is too unstable.
my command is :
```
Miniconda/envs/longlora/bin/python -u supervised-fine-tun…
-
Trying to run train_hunyuan_lora_ui.py, and getting the following error:
``` log
python train_hunyuan_lora_ui.py --seed 12151004 --logging_dir logs --mixed_precision bf16 --report_to wandb --lr_wa…
-
Training the UNet...
'########:'########:::::'###::::'####:'##::: ##:'####:'##::: ##::'######:::
... ##..:: ##.... ##:::'## ##:::. ##:: ###:: ##:. ##:: ###:: ##:'##... ##::
::: ##:::: ##:::: ##::'#…