[Bug]: 跑llama3-8b含tp的prefix-tuning微调时报错ValueError

软件环境

- paddlepaddle-gpu: 0.0.0.post120
- paddlenlp: 3.0.0b2

重复问题

[X] I have searched the existing issues

错误描述

在跑llama3-8b的prefix-tuning微调时，开了tp后会报错ValueError: (InvalidArgument) The 2-th dimension of input[0] and input[1] is expected to be equal.But received input[0]'s shape = [1, 128, 16, 128], input[1]'s shape = [1, 249, 4, 128].
  [Hint: Expected inputs_dims[0][j] == inputs_dims[i][j], but received inputs_dims[0][j]:16 != inputs_dims[i][j]:4.] (at ../paddle/phi/kernels/funcs/concat_funcs.h:72)

稳定复现步骤 & 代码

cd PaddleNLP/llm/config/llama
cat pt_argument.json { "model_name_or_path": "meta-llama/Meta-Llama-3-8B", "dataset_name_or_path": "./data", "output_dir": "./checkpoints/pt_ckpts", "per_device_train_batch_size": 1, "gradient_accumulation_steps": 1, "per_device_eval_batch_size": 1, "eval_accumulation_steps": 1, "num_train_epochs": 3, "learning_rate": 3e-02, "warmup_steps": 30, "logging_steps": 1, "max_steps": 20, "max_evaluate_steps": 3, "evaluation_strategy": "epoch", "save_strategy": "epoch", "src_length": 1024, "max_length": 2048, "do_train": true, "do_eval": true, "disable_tqdm": true, "load_best_model_at_end": true, "eval_with_do_generation": false, "recompute": true, "save_total_limit": 1, "tensor_parallel_degree": 2, "pipeline_parallel_degree": 1, "prefix_tuning": true, "zero_padding": false, "use_flash_attention": true }
python3 -u -m paddle.distributed.launch --gpus "0,1,2,3" run_finetune.py ./config/llama/pt_argument.json

PaddlePaddle / PaddleNLP

[Bug]: 跑llama3-8b含tp的prefix-tuning微调时报错ValueError #9384

软件环境

重复问题

错误描述

稳定复现步骤 & 代码