PaddlePaddle / PaddleNLP

👑 Easy-to-use and powerful NLP and LLM library with 🤗 Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including 🗂Text Classification, 🔍 Neural Search, ❓ Question Answering, ℹ️ Information Extraction, 📄 Document Intelligence, 💌 Sentiment Analysis etc.
https://paddlenlp.readthedocs.io
Apache License 2.0
12.04k stars 2.93k forks source link

SystemError: (Fatal) Blocking queue is killed because the data reader raises an exception. #546

Closed forrestneo closed 1 year ago

forrestneo commented 3 years ago

WARNING:root:DataLoader reader thread raised an exception. Exception in thread Thread-6: Traceback (most recent call last): File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/threading.py", line 926, in _bootstrap_inner self.run() File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/threading.py", line 870, in run self._target(*self._args, *self._kwargs) File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dataloader/dataloader_iter.py", line 199, in _thread_loop six.reraise(sys.exc_info()) File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/six.py", line 703, in reraise raise value File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dataloader/dataloader_iter.py", line 167, in _thread_loop batch = self._dataset_fetcher.fetch(indices) File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dataloader/fetcher.py", line 61, in fetch data = [self.dataset[idx] for idx in batch_indices] File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dataloader/fetcher.py", line 61, in data = [self.dataset[idx] for idx in batch_indices] File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddlenlp/datasets/dataset.py", line 181, in getitem idx]) if self._transform_pipline else self.new_data[idx] File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddlenlp/datasets/dataset.py", line 172, in _transform data = fn(data) File "/home/aistudio/utils.py", line 20, in convert_example labels = ['O'] + labels + ['O'] TypeError: can only concatenate list (not "str") to list

---------------------------------------------------------------------------SystemError Traceback (most recent call last) in 1 step = 0 2 for epoch in range(10): ----> 3 for idx, (input_ids, token_type_ids, length, labels) in enumerate(train_loader): 4 logits = model(input_ids, token_type_ids) 5 loss = paddle.mean(loss_fn(logits, labels)) /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dataloader/dataloader_iter.py in next(self) 202 try: 203 if in_dygraph_mode(): --> 204 data = self._reader.read_next_var_list() 205 data = _restore_batch(data, self._structureinfos.pop(0)) 206 else: SystemError: (Fatal) Blocking queue is killed because the data reader raises an exception. [Hint: Expected killed != true, but received killed_:1 == true:1.] (at /paddle/paddle/fluid/operators/reader/blocking_queue.h:166)

wawltor commented 3 years ago
图片

您好,看报错上面数据处理部分是有问题的, 可以检查一下上述的代码问题

kobe24o commented 3 years ago
epoch:179 - step:2699 - loss: 0.000531, best f1 score: 0.910891 on epoch 48
epoch:179 - step:2700 - loss: 0.000017, best f1 score: 0.910891 on epoch 48
eval precision: 0.897959 - recall: 0.897959 - f1: 0.897959
[2021-09-18 15:17:50,866] [    INFO] - Already cached /home/web/.paddlenlp/models/ernie-gram-zh/vocab.txt
[2021-09-18 15:17:50,874] [    INFO] - Already cached /home/web/.paddlenlp/models/ernie-gram-zh/ernie_gram_zh.pdparams
----train with ./data/origin_train1.txt  train PRF------
WARNING:root:DataLoader reader thread raised an exception.
Exception in thread Traceback (most recent call last):
Thread-361  File "method_compare.py", line 144, in <module>
:
    Traceback (most recent call last):
caltrainPRF(model, tokenizer, "./data/origin_dev.txt", label_vocab, trans_func)  File "/opt/bdp/data01/anaconda3/envs/pp21/lib/python3.8/threading.py", line 932, in _bootstrap_inner

  File "/opt/bdp/data01/name_ner/ner_utils.py", line 178, in caltrainPRF
    self.run()
  File "/opt/bdp/data01/anaconda3/envs/pp21/lib/python3.8/threading.py", line 870, in run
        res = predict_batchText(model, tokenizer, texts, label_vocab, trans_func)self._target(*self._args, **self._kwargs)

  File "/opt/bdp/data01/name_ner/ner_utils.py", line 125, in predict_batchText
  File "/opt/bdp/data01/anaconda3/envs/pp21/lib/python3.8/site-packages/paddle/fluid/dataloader/dataloader_iter.py", line 192, in _thread_loop
        return predict(model, data_loader, pred_data, label_vocab, pred=True, predSingleText=2)six.reraise(*sys.exc_info())

  File "/opt/bdp/data01/name_ner/ner_utils.py", line 86, in predict
  File "/opt/bdp/data01/anaconda3/envs/pp21/lib/python3.8/site-packages/six.py", line 719, in reraise
        raise valuefor input_ids, token_type_ids, lengths in data_loader:

  File "/opt/bdp/data01/anaconda3/envs/pp21/lib/python3.8/site-packages/paddle/fluid/dataloader/dataloader_iter.py", line 160, in _thread_loop
  File "/opt/bdp/data01/anaconda3/envs/pp21/lib/python3.8/site-packages/paddle/fluid/dataloader/dataloader_iter.py", line 197, in __next__
    data = self._reader.read_next_var_list()
batch = self._dataset_fetcher.fetch(indices)SystemError:
(Fatal) Blocking queue is killed because the data reader raises an exception.
  [Hint: Expected killed_ != true, but received killed_:1 == true:1.] (at /paddle/paddle/fluid/operators/reader/blocking_queue.h:166)
  File "/opt/bdp/data01/anaconda3/envs/pp21/lib/python3.8/site-packages/paddle/fluid/dataloader/fetcher.py", line 106, in fetch

我也遇到这样的问题。ner项目,180个 epochs 训练没问题的, linux , py 3.8.10, ppnlp 2.0.8, paddle 2.1.2

for i in range(1, 10):
        testData = './data/origin_train' + str(i) + '.txt'
        print('-----train deepModel with {} ------'.format(testData))
        train_deepmodel(testData, testsize=0.1, epochs=1, learning_rate=1e-5, batch_size=64)
        label_vocab = load_dict('tag.txt')
        tokenizer = ErnieGramTokenizer.from_pretrained("ernie-gram-zh")
        trans_func = partial(convert_example, tokenizer=tokenizer, label_vocab=label_vocab)  # 转换函数,转换成数字
        model = ErnieGramForTokenClassification.from_pretrained("ernie-gram-zh", num_classes=len(label_vocab))
        state_dict = paddle.load("./best_model_pdparms")
        model.set_dict(state_dict)
        print("----train with {}  train PRF------".format(testData))
        caltrainPRF(model, tokenizer, "./data/origin_dev.txt", label_vocab, trans_func)
        print("----train with {}  name matching PRF------".format(testData))
        calNamePRF_ernie(model, tokenizer, "./data/origin_dev.txt", label_vocab)

caltrainPRF(model, tokenizer, "./data/origin_dev.txt", label_vocab, trans_func) calNamePRF_ernie(model, tokenizer, "./data/origin_dev.txt", label_vocab)

单独运行这两个函数也没有报错,难道是 for 循环了就报错?求大佬解答!多谢啦!

kobe24o commented 3 years ago
epoch:179 - step:2699 - loss: 0.000531, best f1 score: 0.910891 on epoch 48
epoch:179 - step:2700 - loss: 0.000017, best f1 score: 0.910891 on epoch 48
eval precision: 0.897959 - recall: 0.897959 - f1: 0.897959
[2021-09-18 15:17:50,866] [    INFO] - Already cached /home/web/.paddlenlp/models/ernie-gram-zh/vocab.txt
[2021-09-18 15:17:50,874] [    INFO] - Already cached /home/web/.paddlenlp/models/ernie-gram-zh/ernie_gram_zh.pdparams
----train with ./data/origin_train1.txt  train PRF------
WARNING:root:DataLoader reader thread raised an exception.
Exception in thread Traceback (most recent call last):
Thread-361  File "method_compare.py", line 144, in <module>
:
    Traceback (most recent call last):
caltrainPRF(model, tokenizer, "./data/origin_dev.txt", label_vocab, trans_func)  File "/opt/bdp/data01/anaconda3/envs/pp21/lib/python3.8/threading.py", line 932, in _bootstrap_inner

  File "/opt/bdp/data01/name_ner/ner_utils.py", line 178, in caltrainPRF
    self.run()
  File "/opt/bdp/data01/anaconda3/envs/pp21/lib/python3.8/threading.py", line 870, in run
        res = predict_batchText(model, tokenizer, texts, label_vocab, trans_func)self._target(*self._args, **self._kwargs)

  File "/opt/bdp/data01/name_ner/ner_utils.py", line 125, in predict_batchText
  File "/opt/bdp/data01/anaconda3/envs/pp21/lib/python3.8/site-packages/paddle/fluid/dataloader/dataloader_iter.py", line 192, in _thread_loop
        return predict(model, data_loader, pred_data, label_vocab, pred=True, predSingleText=2)six.reraise(*sys.exc_info())

  File "/opt/bdp/data01/name_ner/ner_utils.py", line 86, in predict
  File "/opt/bdp/data01/anaconda3/envs/pp21/lib/python3.8/site-packages/six.py", line 719, in reraise
        raise valuefor input_ids, token_type_ids, lengths in data_loader:

  File "/opt/bdp/data01/anaconda3/envs/pp21/lib/python3.8/site-packages/paddle/fluid/dataloader/dataloader_iter.py", line 160, in _thread_loop
  File "/opt/bdp/data01/anaconda3/envs/pp21/lib/python3.8/site-packages/paddle/fluid/dataloader/dataloader_iter.py", line 197, in __next__
    data = self._reader.read_next_var_list()
batch = self._dataset_fetcher.fetch(indices)SystemError:
(Fatal) Blocking queue is killed because the data reader raises an exception.
  [Hint: Expected killed_ != true, but received killed_:1 == true:1.] (at /paddle/paddle/fluid/operators/reader/blocking_queue.h:166)
  File "/opt/bdp/data01/anaconda3/envs/pp21/lib/python3.8/site-packages/paddle/fluid/dataloader/fetcher.py", line 106, in fetch

我也遇到这样的问题。ner项目,180个 epochs 训练没问题的, linux , py 3.8.10, ppnlp 2.0.8, paddle 2.1.2

for i in range(1, 10):
        testData = './data/origin_train' + str(i) + '.txt'
        print('-----train deepModel with {} ------'.format(testData))
        train_deepmodel(testData, testsize=0.1, epochs=1, learning_rate=1e-5, batch_size=64)
        label_vocab = load_dict('tag.txt')
        tokenizer = ErnieGramTokenizer.from_pretrained("ernie-gram-zh")
        trans_func = partial(convert_example, tokenizer=tokenizer, label_vocab=label_vocab)  # 转换函数,转换成数字
        model = ErnieGramForTokenClassification.from_pretrained("ernie-gram-zh", num_classes=len(label_vocab))
        state_dict = paddle.load("./best_model_pdparms")
        model.set_dict(state_dict)
        print("----train with {}  train PRF------".format(testData))
        caltrainPRF(model, tokenizer, "./data/origin_dev.txt", label_vocab, trans_func)
        print("----train with {}  name matching PRF------".format(testData))
        calNamePRF_ernie(model, tokenizer, "./data/origin_dev.txt", label_vocab)

caltrainPRF(model, tokenizer, "./data/origin_dev.txt", label_vocab, trans_func) calNamePRF_ernie(model, tokenizer, "./data/origin_dev.txt", label_vocab)

单独运行这两个函数也没有报错,难道是 for 循环了就报错?求大佬解答!多谢啦!

自己解决问题了,预测的时候的一个参数开关忘记打开,数据格式不对,多了 label 参数,打扰了!

github-actions[bot] commented 1 year ago

This issue is stale because it has been open for 60 days with no activity. 当前issue 60天内无活动,被标记为stale。

github-actions[bot] commented 1 year ago

This issue was closed because it has been inactive for 14 days since being marked as stale. 当前issue 被标记为stale已有14天,即将关闭。