PaddlePaddle / PaddleNLP

👑 Easy-to-use and powerful NLP and LLM library with 🤗 Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including 🗂Text Classification, 🔍 Neural Search, ❓ Question Answering, ℹ️ Information Extraction, 📄 Document Intelligence, 💌 Sentiment Analysis etc.
https://paddlenlp.readthedocs.io
Apache License 2.0
12.19k stars 2.95k forks source link

[Question]: 使用UIE-X进行模型微调的时候 --device gpu 没用,没用报错也没有正常训练,直接跳出命令窗口 #5480

Open ANemo-yj opened 1 year ago

ANemo-yj commented 1 year ago

请提出你的问题

... "rel_2d_pos_bins": 64, "rel_pos_bins": 32, "shape_size": 128, "task_id": 0, "task_type_vocab_size": 3, "type_vocab_size": 100, "use_task_id": true, "vocab_size": 250002 }

W0330 17:50:00.926941 18020 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 8.6, Driver API Version: 12.0, Runtime API Version: 11.2 W0330 17:50:00.949963 18020 gpu_resources.cc:91] device: 0, cuDNN Version: 8.2.

54405@ANemo MINGW64 /e/Uie-x

问题:显示我的gpu版本后没有继续训练 没有反应就跳出来(注意没有任何报错)更改--per_device_train_batch_size为1仍然无用,cpu batch_size调小后能正常训练 使用import paddle paddle.utils.run_check() 能够正常调用gpu Running verify PaddlePaddle program ... W0330 17:56:11.574090 6500 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 8.6, Driver API Version: 12.0, Runtime API Version: 11.2 W0330 17:56:11.591100 6500 gpu_resources.cc:91] device: 0, cuDNN Version: 8.2. PaddlePaddle works well on 1 GPU. PaddlePaddle works well on 1 GPUs. PaddlePaddle is installed successfully! Let's start deep learning with PaddlePaddle now.

我的版本信息为 paddlenlp 2.5.2 paddlepaddle 2.4.2 paddleocr 2.6.1.3 paddlepaddle-gpu 2.4.2.post112

ANemo-yj commented 1 year ago

我的训练参数为 python ./PaddleNLP/applications/information_extraction/document/finetune.py \ --device gpu \ --logging_steps 5 \ --save_steps 25 \ --eval_steps 25 \ --seed 42 \ --model_name_or_path uie-x-base \ --output_dir ./checkpoint/model_best \ --train_path datasets/data/train.txt \ --dev_path datasets/data/dev.txt \ --max_seq_len 400 \ --per_device_train_batch_size 1 \ --per_device_eval_batch_size 1 \ --num_train_epochs 1 \ --learning_rate 1e-5 \ --label_names 'start_positions' 'end_positions' \ --do_train \ --do_eval \ --do_export \ --export_model_dir ./checkpoint/model_best \ --overwrite_output_dir \ --disable_tqdm True \ --metric_for_best_model eval_f1 \ --load_best_model_at_end True \ --save_total_limit 1