PaddlePaddle / PaddleNLP

👑 Easy-to-use and powerful NLP and LLM library with 🤗 Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including 🗂Text Classification, 🔍 Neural Search, ❓ Question Answering, ℹ️ Information Extraction, 📄 Document Intelligence, 💌 Sentiment Analysis etc.
https://paddlenlp.readthedocs.io
Apache License 2.0
12.16k stars 2.94k forks source link

模型蒸馏时, 运行 data_distill.py出错 #3872

Closed everzhou closed 1 year ago

everzhou commented 2 years ago

请提出你的问题

task_type 为 entity_extraction时,会报错,其它类型不会。 我用的命令为 python data_distill.py \ --data_path ../data \ --save_dir student_data \ --task_type entity_extraction \ --synthetic_ratio 10 \ --model_path ../checkpoint/model_best

出错信息 raceback (most recent call last): File "data_distill.py", line 127, in do_data_distill() File "data_distill.py", line 37, in do_data_distill label_maps = schema2label_maps(args.task_type, schema=args.schema) File "/home/aistudio/uiev3/data_distill/utils.py", line 320, in schema2label_maps entity2id[s] = len(entity2id) TypeError: unhashable type: 'dict'

Drizzlenum commented 2 years ago

data_distill.py文件中需要修改标签schema,改成自己的

JunnYu commented 2 years ago

https://github.com/PaddlePaddle/PaddleNLP/blob/4cc8578404a6f31f5c1c5b2a98ae1b0404cda4b8/model_zoo/uie/data_distill/data_distill.py#L123 我也发现了,感觉这里应该直接改成list,不是字典。

schema = ["手术治疗", "实验室检查", "影像学检查"]

everzhou commented 2 years ago

收到,明白了,明天我再试一下

everzhou commented 1 year ago

@JunnYu schema 改了后,data_distill.py 不报错了。 evaluate_teacher.py 也没报错, train.py 报错了。具体命令为: python train.py \ --task_type entity_extraction \ --train_path student_data/train_data.json \ --dev_path student_data/dev_data.json \ --label_maps_path student_data/label_maps.json \ --num_epochs 200 \ --encoder ernie-3.0-mini-zh

报错信息为: W1125 13:50:53.044281 5109 gpu_resources.cc:91] device: 0, cuDNN Version: 8.2. Exception in thread Thread-3: Traceback (most recent call last): File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/threading.py", line 926, in _bootstrap_inner self.run() File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/threading.py", line 870, in run self._target(*self._args, **self._kwargs) File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dataloader/dataloader_iter.py", line 218, in _thread_loop self._thread_done_event) File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dataloader/fetcher.py", line 121, in fetch data.append(self.dataset[idx]) File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddlenlp/datasets/dataset.py", line 277, in getitem ) if self._transform_pipline else self.new_data[idx] File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddlenlp/datasets/dataset.py", line 267, in _transform data = fn(data) File "/home/aistudio/uiev3/data_distill/utils.py", line 114, in tokenize_and_align_train_labels label = label_maps['entity2id'][e['type']] KeyError: 'object'

everzhou commented 1 year ago

@linjieccc 能看下 train.py 报错的问题么 ?

Viserion-nlper commented 1 year ago

我也有同样的问题

linjieccc commented 1 year ago

@everzhou @Viserion-nlper Hi,

定位到问题应该是标注数据中有data_distill.py的schema中未被定义的标签,已提PR对这部分体验进行优化 #4153

github-actions[bot] commented 1 year ago

This issue is stale because it has been open for 60 days with no activity. 当前issue 60天内无活动,被标记为stale。

github-actions[bot] commented 1 year ago

This issue was closed because it has been inactive for 14 days since being marked as stale. 当前issue 被标记为stale已有14天,即将关闭。