PaddlePaddle / PaddleNLP

👑 Easy-to-use and powerful NLP and LLM library with 🤗 Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including 🗂Text Classification, 🔍 Neural Search, ❓ Question Answering, ℹ️ Information Extraction, 📄 Document Intelligence, 💌 Sentiment Analysis etc.
https://paddlenlp.readthedocs.io
Apache License 2.0
12.1k stars 2.94k forks source link

[Bug]: 跑llama3-8b含tp的prefix-tuning微调时报错ValueError #9384

Open hjx620 opened 3 hours ago

hjx620 commented 3 hours ago

软件环境

- paddlepaddle-gpu: 0.0.0.post120
- paddlenlp: 3.0.0b2

重复问题

错误描述

在跑llama3-8b的prefix-tuning微调时,开了tp后会报错ValueError: (InvalidArgument) The 2-th dimension of input[0] and input[1] is expected to be equal.But received input[0]'s shape = [1, 128, 16, 128], input[1]'s shape = [1, 249, 4, 128].
  [Hint: Expected inputs_dims[0][j] == inputs_dims[i][j], but received inputs_dims[0][j]:16 != inputs_dims[i][j]:4.] (at ../paddle/phi/kernels/funcs/concat_funcs.h:72)

稳定复现步骤 & 代码

  1. cd PaddleNLP/llm/config/llama
  2. cat pt_argument.json { "model_name_or_path": "meta-llama/Meta-Llama-3-8B", "dataset_name_or_path": "./data", "output_dir": "./checkpoints/pt_ckpts", "per_device_train_batch_size": 1, "gradient_accumulation_steps": 1, "per_device_eval_batch_size": 1, "eval_accumulation_steps": 1, "num_train_epochs": 3, "learning_rate": 3e-02, "warmup_steps": 30, "logging_steps": 1, "max_steps": 20, "max_evaluate_steps": 3, "evaluation_strategy": "epoch", "save_strategy": "epoch", "src_length": 1024, "max_length": 2048, "do_train": true, "do_eval": true, "disable_tqdm": true, "load_best_model_at_end": true, "eval_with_do_generation": false, "recompute": true, "save_total_limit": 1, "tensor_parallel_degree": 2, "pipeline_parallel_degree": 1, "prefix_tuning": true, "zero_padding": false, "use_flash_attention": true }
  3. python3 -u -m paddle.distributed.launch --gpus "0,1,2,3" run_finetune.py ./config/llama/pt_argument.json
hjx620 commented 3 hours ago

微信图片_20241107144930 报错的代码位置 微信图片_20241107144943