Closed everzhou closed 1 year ago
data_distill.py文件中需要修改标签schema,改成自己的
https://github.com/PaddlePaddle/PaddleNLP/blob/4cc8578404a6f31f5c1c5b2a98ae1b0404cda4b8/model_zoo/uie/data_distill/data_distill.py#L123 我也发现了,感觉这里应该直接改成list,不是字典。
schema = ["手术治疗", "实验室检查", "影像学检查"]
收到,明白了,明天我再试一下
@JunnYu schema 改了后,data_distill.py 不报错了。 evaluate_teacher.py 也没报错, train.py 报错了。具体命令为: python train.py \ --task_type entity_extraction \ --train_path student_data/train_data.json \ --dev_path student_data/dev_data.json \ --label_maps_path student_data/label_maps.json \ --num_epochs 200 \ --encoder ernie-3.0-mini-zh
报错信息为: W1125 13:50:53.044281 5109 gpu_resources.cc:91] device: 0, cuDNN Version: 8.2. Exception in thread Thread-3: Traceback (most recent call last): File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/threading.py", line 926, in _bootstrap_inner self.run() File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/threading.py", line 870, in run self._target(*self._args, **self._kwargs) File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dataloader/dataloader_iter.py", line 218, in _thread_loop self._thread_done_event) File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dataloader/fetcher.py", line 121, in fetch data.append(self.dataset[idx]) File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddlenlp/datasets/dataset.py", line 277, in getitem ) if self._transform_pipline else self.new_data[idx] File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddlenlp/datasets/dataset.py", line 267, in _transform data = fn(data) File "/home/aistudio/uiev3/data_distill/utils.py", line 114, in tokenize_and_align_train_labels label = label_maps['entity2id'][e['type']] KeyError: 'object'
@linjieccc 能看下 train.py 报错的问题么 ?
我也有同样的问题
@everzhou @Viserion-nlper Hi,
定位到问题应该是标注数据中有data_distill.py
的schema中未被定义的标签,已提PR对这部分体验进行优化 #4153
This issue is stale because it has been open for 60 days with no activity. 当前issue 60天内无活动,被标记为stale。
This issue was closed because it has been inactive for 14 days since being marked as stale. 当前issue 被标记为stale已有14天,即将关闭。
请提出你的问题
task_type 为 entity_extraction时,会报错,其它类型不会。 我用的命令为 python data_distill.py \ --data_path ../data \ --save_dir student_data \ --task_type entity_extraction \ --synthetic_ratio 10 \ --model_path ../checkpoint/model_best
出错信息 raceback (most recent call last): File "data_distill.py", line 127, in
do_data_distill()
File "data_distill.py", line 37, in do_data_distill
label_maps = schema2label_maps(args.task_type, schema=args.schema)
File "/home/aistudio/uiev3/data_distill/utils.py", line 320, in schema2label_maps
entity2id[s] = len(entity2id)
TypeError: unhashable type: 'dict'