Closed KenPanda closed 1 year ago
你好,这里https://github.com/PaddlePaddle/PaddleNLP/blob/develop/applications/text_summarization/pegasus/train.py#L130 改成
parser.add_argument("--use_SSTIA", action="store_true", help="Whether to use SSTIA.")
pin @LazyFyh
@gongel 大神,按您说的改完后,还是报错,报错如下
$ unset CUDA_VISIBLE_DEVICES
python -m paddle.distributed.launch --gpus "0" train.py \
--model_name_or_path=Randeng-Pegasus-238M-Summary-Chinese \
--train_file train.json \
--eval_file test.json \
--output_dir pegasus_out \
--max_source_length 128 \
--max_target_length 64 \
--epoch 20 \
--logging_steps 1 \
--save_steps 10000 \
--train_batch_size 128 \
--eval_batch_size 128 \
--learning_rate 5e-5 \
--warmup_proportion 0.02 \
--weight_decay=0.01 \
--device=gpu \
Traceback (most recent call last):
File "C:\Users\G\Desktop\PaddleNLP-develop\applications\text_summarization\pegasus\train.py", line 298, in
model_name_or_path is wrong
@gongel 其他的错误不用理会吗? model_name_or_path 我反复修改了N次,各种报错,请问能指点一下该如何设置吗?
Change to: --model_name_or_path=IDEA-CCNL/Randeng-Pegasus-238M-Summary-Chinese
@gongel 按您说的,也就是官方文档上的设置路径不行,我是windows,我在反复尝试中
@gengel 大神,我尝试了10种设置,但是output_dir总是报这种错 $ unset CUDA_VISIBLE_DEVICES==0 python -m paddle.distributed.launch --gpus "0" train.py \ --model_name_or_path=IDEA-CCNL\Randeng-Pegasus-238M-Summary-Chinese\ --train_file train.json \ --eval_file test.json \ --output_dir pegasus_out \ --max_source_length 128 \ --max_target_length 64 \ --epoch 20 \ --logging_steps 1 \ --save_steps 10000 \ --train_batch_size 128 \ --eval_batch_size 128 \ --learning_rate 5e-5 \ --warmup_proportion 0.02 \ --weight_decay=0.01 \ --device=gpu \
usage: train.py [-h] [--model_name_or_path MODEL_NAME_OR_PATH] [--train_file TRAIN_FILE] [--eval_file EVAL_FILE] --output_dir OUTPUT_DIR [--max_source_length MAX_SOURCE_LENGTH] [--min_target_length MIN_TARGET_LENGTH] [--max_target_length MAX_TARGET_LENGTH] [--learning_rate LEARNING_RATE] [--epoch EPOCH] [--logging_steps LOGGING_STEPS] [--save_steps SAVE_STEPS] [--train_batch_size TRAIN_BATCH_SIZE] [--eval_batch_size EVAL_BATCH_SIZE] [--weight_decay WEIGHT_DECAY] [--warmup_steps WARMUP_STEPS] [--warmup_proportion WARMUP_PROPORTION] [--adam_epsilon ADAM_EPSILON] [--max_steps MAX_STEPS] [--seed SEED] [--device {cpu,gpu,xpu}] [--use_amp USE_AMP] [--scale_loss SCALE_LOSS] [--use_SSTIA] [--mix_ratio MIX_RATIO] train.py: error: the following arguments are required: --output_dir
你好,这里https://github.com/PaddlePaddle/PaddleNLP/blob/develop/applications/text_summarization/pegasus/train.py#L130 改成
parser.add_argument("--use_SSTIA", action="store_true", help="Whether to use SSTIA.")
该bug已经修复:https://github.com/PaddlePaddle/PaddleNLP/pull/4646
你好,windows命令行需要这么使用哈:python -m paddle.distributed.launch --gpus "0" train.py --train_file train.json --eval_file test.json ...
@gengel 谢谢您,我查了很多学习了很多python的语法,之前已经摸索出来了一种写法,您的写法更简洁,两种都ok,现在cpu下已经正常跑起来了,但是Gpu下跑起来显存会直接几乎占满,然后几秒钟就自动退出了,提示如下:
$ unset CUDA_VISIBLE_DEVICES
python -m paddle.distributed.launch --gpus "0" train.py --model_name_or_path=IDEA-CCNL/Randeng-Pegasus-238M-Summary-Chinese --train_file data/train.json --eval_file data/test.json --output_dir pegasus_out --max_source_length 128 --max_target_length 64 --epoch 20 --logging_steps 1 --save_steps 10000 --train_batch_size 128 --eval_batch_size 128 --learning_rate 5e-5 --warmup_proportion 0.02 --weight_decay=0.01 --device=gpu [2023-02-06 01:48:42,008] [ WARNING] arrow_dataset.py:3036 - Loading cached processed dataset at C:\Users\G.cache\huggingface\datasets\json\default-b7260c8ec883c6c8\0.0.0\0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51\cache-a7cae6a9d7e8c899.arrow [2023-02-06 01:48:42,432] [ WARNING] arrow_dataset.py:3036 - Loading cached processed dataset at C:\Users\G.cache\huggingface\datasets\json\default-fbdafb0f6261405c\0.0.0\0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51\cache-328d667dfd908829.arrow [2023-02-06 01:48:42,434] [ INFO] - Already cached C:\Users\G.paddlenlp\models\IDEA-CCNL/Randeng-Pegasus-238M-Summary-Chinese\model_state.pdparams [2023-02-06 01:48:42,434] [ INFO] - Already cached C:\Users\G.paddlenlp\models\IDEA-CCNL/Randeng-Pegasus-238M-Summary-Chinese\model_config.json W0206 01:48:42.436866 10792 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 7.5, Driver API Version: 12.0, Runtime API Version: 10.2 W0206 01:48:42.503891 10792 gpu_resources.cc:91] device: 0, cuDNN Version: 7.6.
尝试降低train_batch_size
和eval_batch_size
@gongel 谢谢大神,早试过,都降到2或1也问题依旧,求教!
@gongel 有时可以完成1 step,有时1 step也完成不了
@gongel cpu跑训练目前正常
gpu显存多大的?有可能gpu显存太小。
@gongel gpu显存6个G,开跑10秒左右会到5.7G或5.8G,然后维持几秒就退出了,给出上面的提示。我尝试了所有能降低的参数,试图降低显存消耗,没效果,也无法延迟运行时间,另外我尝试了10余个版本,2.4.X之前一些版本,跑起来会显存固定在4.X左右,然后无任何有效输出,也不会进行step
6g跑不起来的,最好要16g以上
@gongel 那也就是说1660Ti,跑不了咯?即使用共享显存这些? 另外想请教,咱PaddleNPL是否支持cpu多核运算呢?该如何设置?方便指教否?
你好,1660Ti 6G跑不起来,可以去AIStudio申请免费的大显卡使用:https://aistudio.baidu.com/aistudio/index,或者尝试更换更大的GPU、或者用CPU
如果是12G显存跑得起来吗。。
请提出你的问题
本人go程序员,目前在学习python,想自己训练有监督摘要模型,选中text_summarization,但按说明部署运行后报错如下如下
unset CUDA_VISIBLE_DEVICES
py -m paddle.distributed.launch --gpus "0" train.py \ --model_name_or_path=IDEA-CCNL/Randeng-Pegasus-238M-Summary-Chinese \ --train_file data/train.json \ --eval_file data/test.json \ --output_dir pegasus_out \ --max_source_length 128 \ --max_target_length 64 \ --epoch 20 \ --logging_steps 1 \ --save_steps 10000 \ --train_batch_size 128 \ --eval_batch_size 128 \ --learning_rate 5e-5 \ --warmup_proportion 0.02 \ --weight_decay=0.01 \ --device=gpu \
Traceback (most recent call last): File "C:\Users\G\Desktop\PaddleNLP-develop\applications\text_summarization\pegasus\train.py", line 296, in
args = parse_args()
File "C:\Users\G\Desktop\PaddleNLP-develop\applications\text_summarization\pegasus\train.py", line 130, in parse_args
parser.add_argument("--use_SSTIA", action="store_true", type=bool, help="Whether to use SSTIA.")
File "C:\Users\G\AppData\Local\Programs\Python\Python310\lib\argparse.py", line 1423, in add_argument
action = action_class(**kwargs)
TypeError: _StoreTrueAction.init() got an unexpected keyword argument 'type'