Facico / Chinese-Vicuna

Chinese-Vicuna: A Chinese Instruction-following LLaMA-based Model —— 一个中文低资源的llama+lora方案,结构参考alpaca
https://github.com/Facico/Chinese-Vicuna
Apache License 2.0
4.14k stars 421 forks source link

更新代码后,重新执行finetune.sh出错, TypeError: init_process_group() got multiple values for keyword argument 'backend' #112

Open alisyzhu opened 1 year ago

alisyzhu commented 1 year ago

昨日重新拉取git code之后,再次执行finetune.sh,torchrun 就会报错。 【初始环境】 A100 * 1 accelerate 0.18.0 bitsandbytes 0.37.2 transformers 4.29.0.dev0 【修改环境v1】 执行pip install transformers==4.28.1 结果:仍然错误 【修改环境v2】 执行 pip install git+https://github.com/huggingface/transformers@ff20f9cf3615a8638023bc82925573cb9d0f3560 结果:仍然报错,错误如下:

image image image
Facico commented 1 year ago

finetune.py一个月都没改过了,老问题了你单卡就别用torchrun了,直接用python跑 (报错的时候,你可以把你的错误在我们仓库搜一下)

dizhenx commented 1 year ago

finetune.py一个月都没改过了,老问题了你单卡就别用torchrun了,直接用python跑 (报错的时候,你可以把你的错误在我们仓库搜一下)

我现在运行bash finetune_others_continue.sh也报这个错,这个错误是在调用finetune.py:237行时发生的

Facico commented 1 year ago

@dizhenx 这个和哪个脚本没关系,多卡用torchrun(我们脚本都是默认多卡的),单卡就不要用了,直接用python

wangrui6 commented 1 year ago

@Facico A100 的训练参数组合有经验吗? ···

optimized for RTX 4090. for larger GPUs, increase some of these?

MICRO_BATCH_SIZE = 4 # this could actually be 5 but i like powers of 2 BATCH_SIZE = 128 MAX_STEPS = None GRADIENT_ACCUMULATION_STEPS = BATCH_SIZE // MICRO_BATCH_SIZE EPOCHS = 3 # we don't always need 3 tbh LEARNING_RATE = 3e-4 # the Karpathy constant CUTOFF_LEN = 256 # 256 accounts for about 96% of the data LORA_R = 8 LORA_ALPHA = 16 LORA_DROPOUT = 0.05 VAL_SET_SIZE = args.test_size #2000 ··· 尤其是前几个

wangrui6 commented 1 year ago

@Facico 另外想问一下finetune的速度如何?用4090 finetune vicuna13b,100K的samples大概要多久?有可以参考的数据吗?

benjamin555 commented 1 year ago

vicuna13b

这个框架支持精调vicuna13b吗?

Facico commented 1 year ago

@wangrui6 一般只用根据硬件需求调CUTOFF_LEN。不太记得了,应该是几十万数据大概跑了200h @benjamin555 基底是llama 的都支持