更新代码后，重新执行finetune.sh出错， TypeError: init_process_group() got multiple values for keyword argument 'backend'

Facico / Chinese-Vicuna

Chinese-Vicuna: A Chinese Instruction-following LLaMA-based Model —— 一个中文低资源的llama+lora方案，结构参考alpaca

https://github.com/Facico/Chinese-Vicuna

Apache License 2.0

4.14k stars 421 forks source link

更新代码后，重新执行finetune.sh出错， TypeError: init_process_group() got multiple values for keyword argument 'backend' #112

Open alisyzhu opened 1 year ago

alisyzhu commented 1 year ago

昨日重新拉取git code之后，再次执行finetune.sh，torchrun 就会报错。【初始环境】 A100 * 1 accelerate 0.18.0 bitsandbytes 0.37.2 transformers 4.29.0.dev0 【修改环境v1】执行pip install transformers==4.28.1 结果：仍然错误【修改环境v2】执行 pip install git+https://github.com/huggingface/transformers@ff20f9cf3615a8638023bc82925573cb9d0f3560 结果：仍然报错，错误如下：

Facico commented 1 year ago

finetune.py一个月都没改过了，老问题了你单卡就别用torchrun了，直接用python跑（报错的时候，你可以把你的错误在我们仓库搜一下）

dizhenx commented 1 year ago

finetune.py一个月都没改过了，老问题了你单卡就别用torchrun了，直接用python跑（报错的时候，你可以把你的错误在我们仓库搜一下）

我现在运行bash finetune_others_continue.sh也报这个错，这个错误是在调用finetune.py:237行时发生的

Facico commented 1 year ago

@dizhenx 这个和哪个脚本没关系，多卡用torchrun（我们脚本都是默认多卡的），单卡就不要用了，直接用python

wangrui6 commented 1 year ago

@Facico A100 的训练参数组合有经验吗？ ···

optimized for RTX 4090. for larger GPUs, increase some of these?

MICRO_BATCH_SIZE = 4 # this could actually be 5 but i like powers of 2 BATCH_SIZE = 128 MAX_STEPS = None GRADIENT_ACCUMULATION_STEPS = BATCH_SIZE // MICRO_BATCH_SIZE EPOCHS = 3 # we don't always need 3 tbh LEARNING_RATE = 3e-4 # the Karpathy constant CUTOFF_LEN = 256 # 256 accounts for about 96% of the data LORA_R = 8 LORA_ALPHA = 16 LORA_DROPOUT = 0.05 VAL_SET_SIZE = args.test_size #2000 ··· 尤其是前几个

wangrui6 commented 1 year ago

@Facico 另外想问一下finetune的速度如何？用4090 finetune vicuna13b，100K的samples大概要多久？有可以参考的数据吗？

benjamin555 commented 1 year ago

vicuna13b

这个框架支持精调vicuna13b吗？

Facico commented 1 year ago

@wangrui6 一般只用根据硬件需求调CUTOFF_LEN。不太记得了，应该是几十万数据大概跑了200h @benjamin555 基底是llama 的都支持