PaddlePaddle / PaddleNLP

👑 Easy-to-use and powerful NLP and LLM library with 🤗 Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including 🗂Text Classification, 🔍 Neural Search, ❓ Question Answering, ℹ️ Information Extraction, 📄 Document Intelligence, 💌 Sentiment Analysis etc.
https://paddlenlp.readthedocs.io
Apache License 2.0
12.18k stars 2.95k forks source link

[Bug]: Label Studio 标注关系后,在ai studio 平台生成训练/验证集文件报错 #5803

Closed LANSEYOUYUMOWANG closed 1 year ago

LANSEYOUYUMOWANG commented 1 year ago

软件环境

- paddlepaddle:
- paddlepaddle-gpu: 2.4.2.post112
- paddlenlp: 2.5.2

重复问题

错误描述

在Label Studio标注关系后,在ai studio 平台生成训练/验证集文件报错
报错信息:
/opt/conda/envs/python35-paddle120-env/lib/python3.9/site-packages/sklearn/utils/multiclass.py:14: DeprecationWarning: Please use `spmatrix` from the `scipy.sparse` namespace, the `scipy.sparse.base` namespace is deprecated.
  from scipy.sparse.base import spmatrix
[2023-04-27 15:10:12,068] [    INFO] - Converting annotation data...
  0%|                                                    | 0/15 [00:00<?, ?it/s][2023-04-27 15:10:12,070] [    INFO] - Parsing image file a16.jpg ...
  7%|██▉                                         | 1/15 [00:05<01:23,  5.98s/it][2023-04-27 15:10:18,054] [    INFO] - Parsing image file a3.jpg ...
 13%|█████▊                                      | 2/15 [00:08<00:54,  4.20s/it][2023-04-27 15:10:21,010] [    INFO] - Parsing image file a18.jpg ...
 20%|████████▊                                   | 3/15 [00:11<00:40,  3.39s/it][2023-04-27 15:10:23,434] [    INFO] - Parsing image file a15.jpg ...
 27%|███████████▋                                | 4/15 [00:13<00:32,  2.99s/it][2023-04-27 15:10:25,813] [    INFO] - Parsing image file a8.png ...
 33%|██████████████▋                             | 5/15 [00:17<00:31,  3.19s/it][2023-04-27 15:10:29,361] [    INFO] - Parsing image file a6.png ...
 40%|█████████████████▌                          | 6/15 [00:19<00:27,  3.01s/it][2023-04-27 15:10:32,010] [    INFO] - Parsing image file a19.jpg ...
 47%|████████████████████▌                       | 7/15 [00:22<00:22,  2.85s/it][2023-04-27 15:10:34,546] [    INFO] - Parsing image file a10.jpg ...
 53%|███████████████████████▍                    | 8/15 [00:25<00:19,  2.82s/it][2023-04-27 15:10:37,286] [    INFO] - Parsing image file a13.jpg ...
 60%|██████████████████████████▍                 | 9/15 [00:27<00:16,  2.69s/it][2023-04-27 15:10:39,719] [    INFO] - Parsing image file a11.jpg ...
 67%|████████████████████████████▋              | 10/15 [00:30<00:13,  2.75s/it][2023-04-27 15:10:42,591] [    INFO] - Parsing image file a17.jpg ...
 73%|███████████████████████████████▌           | 11/15 [00:33<00:11,  2.86s/it][2023-04-27 15:10:45,692] [    INFO] - Parsing image file a7.png ...
 80%|██████████████████████████████████▍        | 12/15 [00:36<00:08,  2.86s/it][2023-04-27 15:10:48,554] [    INFO] - Parsing image file a15.jpg ...
 87%|█████████████████████████████████████▎     | 13/15 [00:38<00:05,  2.71s/it][2023-04-27 15:10:50,904] [    INFO] - Parsing image file a14.jpg ...
 93%|████████████████████████████████████████▏  | 14/15 [00:41<00:02,  2.66s/it][2023-04-27 15:10:53,453] [    INFO] - Parsing image file a5.jpg ...
100%|███████████████████████████████████████████| 15/15 [00:44<00:00,  2.97s/it]
[2023-04-27 15:10:56,550] [    INFO] - Adding negative samples for first stage prompt...
100%|████████████████████████████████████████| 15/15 [00:00<00:00, 49113.63it/s]
[2023-04-27 15:10:56,551] [    INFO] - Converting annotation data...
  0%|                                                     | 0/4 [00:00<?, ?it/s][2023-04-27 15:10:56,552] [    INFO] - Parsing image file a9.jpg ...
 25%|███████████▎                                 | 1/4 [00:02<00:08,  2.68s/it][2023-04-27 15:10:59,232] [    INFO] - Parsing image file a1.jpg ...
 25%|███████████▎                                 | 1/4 [00:05<00:17,  5.71s/it]
Traceback (most recent call last):
  File "/home/aistudio/PaddleNLP-develop/applications/information_extraction/document/../label_studio.py", line 139, in <module>
    do_convert()
  File "/home/aistudio/PaddleNLP-develop/applications/information_extraction/document/../label_studio.py", line 95, in do_convert
    dev_examples = data_converter.convert_ext_examples(raw_examples[p1:p2], is_train=False)
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.9/site-packages/paddlenlp/utils/tools.py", line 510, in convert_ext_examples
    items = self.process_image_tag(line, task_type="ext")
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.9/site-packages/paddlenlp/utils/tools.py", line 429, in process_image_tag
    "type": r["labels"][0],
IndexError: list index out of range

稳定复现步骤 & 代码

1、在Label Studio标注关系,导出Json文件 2、在ai studio 平台生成训练/验证集文件报错 !python ../label_studio.py \ --label_studio_file ./data/label_studio.json \ --save_dir ./data \ --splits 0.8 0.2 0\ --task_type ext 报错信息: /opt/conda/envs/python35-paddle120-env/lib/python3.9/site-packages/sklearn/utils/multiclass.py:14: DeprecationWarning: Please use spmatrix from the scipy.sparse namespace, the scipy.sparse.base namespace is deprecated. from scipy.sparse.base import spmatrix [2023-04-27 15:10:12,068] [ INFO] - Converting annotation data... 0%| | 0/15 [00:00<?, ?it/s][2023-04-27 15:10:12,070] [ INFO] - Parsing image file a16.jpg ... 7%|██▉ | 1/15 [00:05<01:23, 5.98s/it][2023-04-27 15:10:18,054] [ INFO] - Parsing image file a3.jpg ... 13%|█████▊ | 2/15 [00:08<00:54, 4.20s/it][2023-04-27 15:10:21,010] [ INFO] - Parsing image file a18.jpg ... 20%|████████▊ | 3/15 [00:11<00:40, 3.39s/it][2023-04-27 15:10:23,434] [ INFO] - Parsing image file a15.jpg ... 27%|███████████▋ | 4/15 [00:13<00:32, 2.99s/it][2023-04-27 15:10:25,813] [ INFO] - Parsing image file a8.png ... 33%|██████████████▋ | 5/15 [00:17<00:31, 3.19s/it][2023-04-27 15:10:29,361] [ INFO] - Parsing image file a6.png ... 40%|█████████████████▌ | 6/15 [00:19<00:27, 3.01s/it][2023-04-27 15:10:32,010] [ INFO] - Parsing image file a19.jpg ... 47%|████████████████████▌ | 7/15 [00:22<00:22, 2.85s/it][2023-04-27 15:10:34,546] [ INFO] - Parsing image file a10.jpg ... 53%|███████████████████████▍ | 8/15 [00:25<00:19, 2.82s/it][2023-04-27 15:10:37,286] [ INFO] - Parsing image file a13.jpg ... 60%|██████████████████████████▍ | 9/15 [00:27<00:16, 2.69s/it][2023-04-27 15:10:39,719] [ INFO] - Parsing image file a11.jpg ... 67%|████████████████████████████▋ | 10/15 [00:30<00:13, 2.75s/it][2023-04-27 15:10:42,591] [ INFO] - Parsing image file a17.jpg ... 73%|███████████████████████████████▌ | 11/15 [00:33<00:11, 2.86s/it][2023-04-27 15:10:45,692] [ INFO] - Parsing image file a7.png ... 80%|██████████████████████████████████▍ | 12/15 [00:36<00:08, 2.86s/it][2023-04-27 15:10:48,554] [ INFO] - Parsing image file a15.jpg ... 87%|█████████████████████████████████████▎ | 13/15 [00:38<00:05, 2.71s/it][2023-04-27 15:10:50,904] [ INFO] - Parsing image file a14.jpg ... 93%|████████████████████████████████████████▏ | 14/15 [00:41<00:02, 2.66s/it][2023-04-27 15:10:53,453] [ INFO] - Parsing image file a5.jpg ... 100%|███████████████████████████████████████████| 15/15 [00:44<00:00, 2.97s/it] [2023-04-27 15:10:56,550] [ INFO] - Adding negative samples for first stage prompt... 100%|████████████████████████████████████████| 15/15 [00:00<00:00, 49113.63it/s] [2023-04-27 15:10:56,551] [ INFO] - Converting annotation data... 0%| | 0/4 [00:00<?, ?it/s][2023-04-27 15:10:56,552] [ INFO] - Parsing image file a9.jpg ... 25%|███████████▎ | 1/4 [00:02<00:08, 2.68s/it][2023-04-27 15:10:59,232] [ INFO] - Parsing image file a1.jpg ... 25%|███████████▎ | 1/4 [00:05<00:17, 5.71s/it] Traceback (most recent call last): File "/home/aistudio/PaddleNLP-develop/applications/information_extraction/document/../label_studio.py", line 139, in do_convert() File "/home/aistudio/PaddleNLP-develop/applications/information_extraction/document/../label_studio.py", line 95, in do_convert dev_examples = data_converter.convert_ext_examples(raw_examples[p1:p2], is_train=False) File "/opt/conda/envs/python35-paddle120-env/lib/python3.9/site-packages/paddlenlp/utils/tools.py", line 510, in convert_ext_examples items = self.process_image_tag(line, task_type="ext") File "/opt/conda/envs/python35-paddle120-env/lib/python3.9/site-packages/paddlenlp/utils/tools.py", line 429, in process_image_tag "type": r["labels"][0], IndexError: list index out of range

linjieccc commented 1 year ago

报错看起来是导出的数据少了labels这个字段,可以检查下标注平台导出的标注数据格式是否符合预期

wwangzz1 commented 1 year ago

同样遇到这个问题,请问怎么解决?

Adachi324 commented 1 year ago

请问是怎么解决的呢?

Adachi324 commented 1 year ago

同样遇到这个问题,请问怎么解决?

请问您解决了吗

Vivi529 commented 9 months ago

同样遇到这个问题,请问怎么解决?

请问您解决了吗

同样的问题,请问您解决了吗

Estrellajer commented 6 months ago

不知道前面的朋友解决了没,如果后来者遇到这个问题可以参考我的文章