PaddlePaddle / PaddleNLP

👑 Easy-to-use and powerful NLP and LLM library with 🤗 Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including 🗂Text Classification, 🔍 Neural Search, ❓ Question Answering, ℹ️ Information Extraction, 📄 Document Intelligence, 💌 Sentiment Analysis etc.
https://paddlenlp.readthedocs.io
Apache License 2.0
11.98k stars 2.92k forks source link

paddlenlp-篇章级事件抽取相关 #1957

Closed snoopy1316 closed 2 years ago

snoopy1316 commented 2 years ago

参考文档:https://github.com/PaddlePaddle/PaddleNLP/tree/develop/examples/information_extraction/DuEE 在aistudio中运行,硬件信息:GPU 环境配置python 3.7,PaddlePaddle 2.2.2

触发词识别模型训练

sh run_duee_fin.sh trigger_train

显示:
aistudio@jupyter-1051736-3809794:~/PaddleNLP/examples/information_extraction/DuEE$ bash run_duee_fin.sh trigger_train
check and create directory
dir ./ckpt exist
dir ./ckpt/DuEE-Fin exist
dir ./submit exist

start DuEE-Fin trigger train
-----------  Configuration Arguments -----------
backend: auto
elastic_server: None
force: False
gpus: 0
heter_devices: 
heter_worker_num: None
heter_workers: 
host: None
http_port: None
ips: 127.0.0.1
job_id: None
log_dir: log
np: None
nproc_per_node: None
run_mode: None
scale: 0
server_num: None
servers: 
training_script: sequence_labeling.py
training_script_args: ['--num_epoch', '20', '--learning_rate', '5e-5', '--tag_path', './conf/DuEE-Fin/trigger_tag.dict', '--train_data', './data/DuEE-Fin/trigger/train.tsv', '--dev_data', './data/DuEE-Fin/trigger/dev.tsv', '--test_data', './data/DuEE-Fin/trigger/test.tsv', '--predict_data', './data/DuEE-Fin/sentence/test.json', '--do_train', 'True', '--do_predict', 'False', '--max_seq_len', '300', '--batch_size', '16', '--skip_step', '10', '--valid_step', '50', '--checkpoints', './ckpt/DuEE-Fin/trigger', '--init_ckpt', './ckpt/DuEE-Fin/trigger/best.pdparams', '--predict_save_path', './ckpt/DuEE-Fin/trigger/test_pred.json', '--device', 'gpu']
worker_num: None
workers: 
------------------------------------------------
WARNING 2022-04-14 15:43:20,817 launch.py:423] Not found distinct arguments and compiled with cuda or xpu. Default use collective mode
launch train in GPU mode!
INFO 2022-04-14 15:43:20,821 launch_utils.py:528] Local start 1 processes. First process distributed environment info (Only For Debug): 
    +=======================================================================================+
    |                        Distributed Envs                      Value                    |
    +---------------------------------------------------------------------------------------+
    |                       PADDLE_TRAINER_ID                        0                      |
    |                 PADDLE_CURRENT_ENDPOINT                 127.0.0.1:47901               |
    |                     PADDLE_TRAINERS_NUM                        1                      |
    |                PADDLE_TRAINER_ENDPOINTS                 127.0.0.1:47901               |
    |                     PADDLE_RANK_IN_NODE                        0                      |
    |                 PADDLE_LOCAL_DEVICE_IDS                        0                      |
    |                 PADDLE_WORLD_DEVICE_IDS                        0                      |
    |                     FLAGS_selected_gpus                        0                      |
    |             FLAGS_selected_accelerators                        0                      |
    +=======================================================================================+

INFO 2022-04-14 15:43:20,821 launch_utils.py:532] details abouts PADDLE_TRAINER_ENDPOINTS can be found in log/endpoints.log, and detail running logs maybe found in log/workerlog.0
launch proc_id:6146 idx:0
Traceback (most recent call last):
  File "sequence_labeling.py", line 28, in <module>
    from paddlenlp.data import Stack, Tuple, Pad
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddlenlp/__init__.py", line 23, in <module>
    from . import datasets
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddlenlp/datasets/__init__.py", line 19, in <module>
    from .dureader_robust import *
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddlenlp/datasets/dureader_robust.py", line 23, in <module>
    from datasets.tasks import QuestionAnsweringExtractive
ModuleNotFoundError: No module named 'datasets.tasks'
INFO 2022-04-14 15:43:27,858 launch_utils.py:341] terminate all the procs
ERROR 2022-04-14 15:43:27,859 launch_utils.py:604] ABORT!!! Out of all 1 trainers, the trainer process with rank=[0] was aborted. Please check its log.
INFO 2022-04-14 15:43:31,863 launch_utils.py:341] terminate all the procs
INFO 2022-04-14 15:43:31,863 launch.py:311] Local processes completed.
end DuEE-Fin trigger train

该怎样解决,谢谢

snoopy1316 commented 2 years ago

问题2:在进行模型构建过程中:PaddleNLP提供了ERNIE预训练模型常用序列标注模型,可以通过指定模型名字完成一键加载:from paddlenlp.transformers import ErnieForTokenClassification

model = ErnieForTokenClassification.from_pretrained("ernie-1.0", num_classes=len(label_map)) 输出结果为: Traceback (most recent call last): File "一键加载.py", line 1, in from paddlenlp.transformers import ErnieForTokenClassification, ErnieForSequenceClassification File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddlenlp/init.py", line 23, in from . import datasets File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddlenlp/datasets/init.py", line 19, in from .dureader_robust import * File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddlenlp/datasets/dureader_robust.py", line 23, in from datasets.tasks import QuestionAnsweringExtractive ModuleNotFoundError: No module named 'datasets.tasks'

smallv0221 commented 2 years ago

您好,看起来可能是您没有安装datasets库,目前PaddleNLP依赖datasets库,可以安装datasets之后再试一下