Closed ZenXir closed 1 year ago
TOT_CUDA="0"
CUDAs=(${TOT_CUDA//,/ })
CUDA_NUM=${#CUDAs[@]}
PORT="1234"
DATA_PATH="sample/merge.json"
#DATA_PATH="./sample/merge_sample.json"
OUTPUT_PATH="lora-Vicuna"
MODEL_PATH="/mnt/e/zllama-models/llama-7b-hf"
CUDA_VISIBLE_DEVICES=${TOT_CUDA} torchrun --nproc_per_node=$CUDA_NUM --master_port=$PORT finetune.py \
--data_path $DATA_PATH \
--output_path $OUTPUT_PATH \
--model_path $MODEL_PATH \
--eval_steps 200 \
--save_steps 200 \
--test_size 10000
@ZenXir 你试试用单卡的配置跑会不会报错,因为单卡没有必要使用ddp。或者在运行的python脚本前面加上TORCH_DISTRIBUTED_DEBUG=DETAIL看有没有更详细的报错信息返回。或者试试将TrainingArguments中的ddp_find_unused_parameters=False if ddp else None改成ddp_find_unused_parameters=True看能不能运行成功
把finetune.sh 脚本加了一行:TORCH_DISTRIBUTED_DEBUG=DETAIL
TORCH_DISTRIBUTED_DEBUG=DETAIL
TOT_CUDA="0"
CUDAs=(${TOT_CUDA//,/ })
CUDA_NUM=${#CUDAs[@]}
PORT="1234"
DATA_PATH="sample/merge.json"
#DATA_PATH="./sample/merge_sample.json"
OUTPUT_PATH="lora-Vicuna"
MODEL_PATH="/mnt/e/zllama-models/llama-7b-hf"
CUDA_VISIBLE_DEVICES=${TOT_CUDA} torchrun --nproc_per_node=$CUDA_NUM --master_port=$PORT finetune.py \
--data_path $DATA_PATH \
--output_path $OUTPUT_PATH \
--model_path $MODEL_PATH \
--eval_steps 200 \
--save_steps 200 \
--test_size 10000
ddp_find_unused_parameters=True,
CUDA SETUP: Detected CUDA version 118
CUDA SETUP: Loading binary /root/anaconda3/envs/Chinese-alpaca-lora/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cpu.so...
/mnt/e/zllama-models/llama-7b-hf
Overriding torch_dtype=None with `torch_dtype=torch.float16` due to requirements of `bitsandbytes` to enable model loading in mixed int8. Either pass torch_dtype=torch.float16 or don't pass this argument at all to remove this warning.
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 33/33 [00:23<00:00, 1.43it/s]
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization.
The tokenizer class you load from this checkpoint is 'LLaMATokenizer'.
The class this function is called from is 'LlamaTokenizer'.
Found cached dataset json (/root/.cache/huggingface/datasets/json/default-355c1d1e45d5609a/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 75.73it/s]
Loading cached split indices for dataset at /root/.cache/huggingface/datasets/json/default-355c1d1e45d5609a/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-bac0d5c93ada8bc8.arrow and /root/.cache/huggingface/datasets/json/default-355c1d1e45d5609a/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-ede35f34d12551ce.arrow
/root/anaconda3/envs/Chinese-alpaca-lora/lib/python3.9/site-packages/transformers-4.28.0.dev0-py3.9.egg/transformers/optimization.py:391: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
warnings.warn(
0%| | 0/16029 [00:00<?, ?it/s]Traceback (most recent call last):
File "/mnt/e/Chinese-Vicuna/finetune.py", line 216, in <module>
trainer.train()
File "/root/anaconda3/envs/Chinese-alpaca-lora/lib/python3.9/site-packages/transformers-4.28.0.dev0-py3.9.egg/transformers/trainer.py", line 1636, in train
return inner_training_loop(
File "/root/anaconda3/envs/Chinese-alpaca-lora/lib/python3.9/site-packages/transformers-4.28.0.dev0-py3.9.egg/transformers/trainer.py", line 1903, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/root/anaconda3/envs/Chinese-alpaca-lora/lib/python3.9/site-packages/transformers-4.28.0.dev0-py3.9.egg/transformers/trainer.py", line 2659, in training_step
self.scaler.scale(loss).backward()
File "/root/anaconda3/envs/Chinese-alpaca-lora/lib/python3.9/site-packages/torch-2.0.0-py3.9-linux-x86_64.egg/torch/_tensor.py", line 487, in backward
torch.autograd.backward(
File "/root/anaconda3/envs/Chinese-alpaca-lora/lib/python3.9/site-packages/torch-2.0.0-py3.9-linux-x86_64.egg/torch/autograd/__init__.py", line 200, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
File "/root/anaconda3/envs/Chinese-alpaca-lora/lib/python3.9/site-packages/torch-2.0.0-py3.9-linux-x86_64.egg/torch/autograd/function.py", line 274, in apply
return user_fn(self, *args)
File "/root/anaconda3/envs/Chinese-alpaca-lora/lib/python3.9/site-packages/torch-2.0.0-py3.9-linux-x86_64.egg/torch/utils/checkpoint.py", line 157, in backward
torch.autograd.backward(outputs_with_grad, args_with_grad)
File "/root/anaconda3/envs/Chinese-alpaca-lora/lib/python3.9/site-packages/torch-2.0.0-py3.9-linux-x86_64.egg/torch/autograd/__init__.py", line 200, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: Expected to mark a variable ready only once. This error is caused by one of the following reasons: 1) Use of a module parameter outside the `forward` function. Please make sure model parameters are not shared across multiple concurrent forward-backward passes. or try to use _set_static_graph() as a workaround if this module graph does not change during training loop.2) Reused parameters in multiple reentrant backward passes. For example, if you use multiple `checkpoint` functions to wrap the same part of your model, it would result in the same set of parameters been used by different reentrant backward passes multiple times, and hence marking a variable ready multiple times. DDP does not support such use cases in default. You can try to use _set_static_graph() as a workaround if your module graph does not change over iterations.
Parameter at index 127 has been marked as ready twice. This means that multiple autograd engine hooks have fired for this particular parameter during this iteration. You can set the environment variable TORCH_DISTRIBUTED_DEBUG to either INFO or DETAIL to print parameter names for further debugging.
0%| | 0/16029 [00:24<?, ?it/s]
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 27375) of binary: /root/anaconda3/envs/Chinese-alpaca-lora/bin/python
Traceback (most recent call last):
File "/root/anaconda3/envs/Chinese-alpaca-lora/bin/torchrun", line 33, in <module>
sys.exit(load_entry_point('torch==2.0.0', 'console_scripts', 'torchrun')())
File "/root/anaconda3/envs/Chinese-alpaca-lora/lib/python3.9/site-packages/torch-2.0.0-py3.9-linux-x86_64.egg/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
return f(*args, **kwargs)
File "/root/anaconda3/envs/Chinese-alpaca-lora/lib/python3.9/site-packages/torch-2.0.0-py3.9-linux-x86_64.egg/torch/distributed/run.py", line 794, in main
run(args)
File "/root/anaconda3/envs/Chinese-alpaca-lora/lib/python3.9/site-packages/torch-2.0.0-py3.9-linux-x86_64.egg/torch/distributed/run.py", line 785, in run
elastic_launch(
File "/root/anaconda3/envs/Chinese-alpaca-lora/lib/python3.9/site-packages/torch-2.0.0-py3.9-linux-x86_64.egg/torch/distributed/launcher/api.py", line 134, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "/root/anaconda3/envs/Chinese-alpaca-lora/lib/python3.9/site-packages/torch-2.0.0-py3.9-linux-x86_64.egg/torch/distributed/launcher/api.py", line 250, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
finetune.py FAILED
------------------------------------------------------------
Failures:
<NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
time : 2023-03-24_15:47:13
host : DESKTOP-6KDJTBC.
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 27375)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================
target_link_libraries(quantize PRIVATE ggml pthread)
target_link_libraries(chat PRIVATE ggml pthread)
解决make时链接 pthrea_xxx 库函数找不到的问题,这修改可以不?
非常感谢你的提醒,我们已经更新了CMakeList.txt
大佬老师 有时间了 帮我看看如果用 RTX4090 24G 单卡 finetune吧 我尝试了几次,一直报上面的错误 语料用的是 百度网盘下载的 663M的那个
可以直接python finetune.py --data_path $DATA_PATH --output_path $OUTPUT_PATH --model_path $MODEL_PATH
看下有没有报错
也帮我看看和解释下面这些参数含义吧大佬老师,担心自己理解的不到位,非常感谢
parser.add_argument("--eval_steps", type=int, default=200)
parser.add_argument("--save_steps", type=int, default=200)
parser.add_argument("--test_size", type=int, default=200)
MICRO_BATCH_SIZE = 4 # this could actually be 5 but i like powers of 2
BATCH_SIZE = 128
GRADIENT_ACCUMULATION_STEPS = BATCH_SIZE // MICRO_BATCH_SIZE
EPOCHS = 3 # we don't always need 3 tbh
LEARNING_RATE = 3e-4 # the Karpathy constant
CUTOFF_LEN = 256 # 256 accounts for about 96% of the data
LORA_R = 8
LORA_ALPHA = 16
LORA_DROPOUT = 0.05
VAL_SET_SIZE = args.test_size #2000
TARGET_MODULES = [
"q_proj",
"v_proj",
]
def evaluate(
input,
temperature=0.1,
top_p=0.75,
top_k=40,
num_beams=4,
max_new_tokens=128,
repetition_penalty=1.0,
**kwargs,
):
checkpoint-8000的8000代表我们目前训练的步数, finetune的参数
MICRO_BATCH_SIZE = 4 # this could actually be 5 but i like powers of 2
BATCH_SIZE = 128
GRADIENT_ACCUMULATION_STEPS = BATCH_SIZE // MICRO_BATCH_SIZE
EPOCHS = 3 # we don't always need 3 tbh
LEARNING_RATE = 3e-4 # the Karpathy constant
CUTOFF_LEN = 256 # 256 accounts for about 96% of the data
设置的是批次大小、梯度累积、训练轮数、学习率和我们训练使用的文本长度
LORA_R = 8
LORA_ALPHA = 16
LORA_DROPOUT = 0.05
VAL_SET_SIZE = args.test_size #2000
TARGET_MODULES = [
"q_proj",
"v_proj",
]
是lora相关的配置,详细可以看这里
generate的参数来源于huggingface,文档的位置在这里,
另外,我们使用单卡3090,使用与你相同的环境python=3.9
torch=2.0.0
用torchrun
(也就是那个finetune.sh
)在Windows WSL和Linux上都可以正常训练,以下是我们的环境配置:
conda create -n py3.9 python=3.9
pip install torch==2.0.0
pip install bitsandbytes datasets accelerate loralib sentencepiece
pip install git+https://github.com/huggingface/transformers.git@main # must!
pip install git+https://github.com/huggingface/peft.git
如果你这边用torchrun还不能够运行, 并且有这方面需求的话,可以给我们一份你的环境配置
conda env export > py39.yaml
pip freeze > packages.txt
好的 可以运行了 非常感谢大佬老师
大佬老师 finetune 663M 语料时,需要什么样的显卡配置?