PaddlePaddle / PaddleNLP

👑 Easy-to-use and powerful NLP and LLM library with 🤗 Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including 🗂Text Classification, 🔍 Neural Search, ❓ Question Answering, ℹ️ Information Extraction, 📄 Document Intelligence, 💌 Sentiment Analysis etc.
https://paddlenlp.readthedocs.io
Apache License 2.0
11.71k stars 2.86k forks source link

[Bug]: The file path "examples/text_to_knowledge/nptag/deploy/python/predict.py" may encounter an IndexError. #8641

Open smallbenxiong opened 1 week ago

smallbenxiong commented 1 week ago

软件环境

- paddlepaddle:2.6.1
- paddlepaddle-gpu: 
- paddlenlp: 2.7.0

重复问题

错误描述

  1. In the script:

    for i, d in enumerate(data):
    label = decode(pred_ids[i], id_vocabs)
    result = {
        "text": d,
        "label": label,
    }
    if label not in name_dict:
        scores_can = all_scores_can[i]
        pred_ids_can = all_preds_can[i]
        labels_can = search(scores_can, pred_ids_can, 0, [], 0)
        labels_can.sort(key=lambda d: -d[1])
        for labels in labels_can:
            cls_label_can = decode(labels[0], id_vocabs)
            if cls_label_can in name_dict:
                result["label"] = cls_label_can
                break
            else:
                labels_can = bk_tree.search_similar_word(label)
                result["label"] = labels_can[0][0]
    
    result["category"] = name_dict[result["label"]]
    results.append(result)
    return results

If labels_can is empty, accessing labels_can[0][0] may raise an IndexError.

Perhaps this is a way of modification:

labels_can = bk_tree.search_similar_word(label)
if len(labels_can) != 0:
    result["label"] = labels_can[0][0]

2.

result["category"] = name_dict[result["label"]]

If the issue in 1 is resolved, a KeyError might still occur in this section of the script.

稳定复现步骤 & 代码

Training with a small sample and adjusting the mapping JSON may lead to issues.

wawltor commented 1 week ago

Thank you for reporting the bug issue. You are welcome to create a pull request to address the bug based on your preferred method of modification.

wawltor commented 1 week ago

We made some changes in directory structure, the new path https://github.com/PaddlePaddle/PaddleNLP/tree/develop/legacy/examples/text_to_knowledge

smallbenxiong commented 1 week ago

Thank you for reporting the bug issue. You are welcome to create a pull request to address the bug based on your preferred method of modification.

Thank you for your trust. I will submit my solution code in the coming days.