PaddlePaddle / PaddleNLP

👑 Easy-to-use and powerful NLP and LLM library with 🤗 Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including 🗂Text Classification, 🔍 Neural Search, ❓ Question Answering, ℹ️ Information Extraction, 📄 Document Intelligence, 💌 Sentiment Analysis etc.
https://paddlenlp.readthedocs.io
Apache License 2.0
12.19k stars 2.95k forks source link

[Bug]: None Type Error #8229

Closed leftthomas closed 5 months ago

leftthomas commented 8 months ago

软件环境

- paddlepaddle:
- paddlepaddle-gpu: 2.6.1.post117
- paddlenlp: 2.7.2

重复问题

错误描述

Class WordSubstitute and CharSubstitute must encounter the None Type Error when used with aug_n=1

稳定复现步骤 & 代码

The below code could produce the error:

from paddlenlp.dataaug import WordSubstitute, CharSubstitute
# word_sub = WordSubstitute(aug_type=["synonym", "embedding", "homonym"], create_n=6)
char_sub = CharSubstitute(aug_type=["synonym", "embedding", "homonym"], create_n=6)
sentence = "我是人"
# msg = word_sub.augment(msg)
msg = char_sub.augment(msg)

And I have debugged, the error is raised by _augment_single(seq_tokens, aug_indexes, p) method, the arg p is passed as None (line 133 of char.py) and p is used in line 227:

pp.append(p[i] / len(self.dict[seq_tokens[aug_index]]))

so it raised the None type Error, this should be fixed by given p a default int value or not use the arg p.

leftthomas commented 8 months ago

@wj-Mcat hello, can you confirm this bug and fix this quickly?

lugimzzz commented 7 months ago

可以给我们提一个pr修复这个问题

leftthomas commented 7 months ago

@lugimzzz 额,我看你们代码的逻辑那个地方涉及到结果的产生,需要用到随机数,还是你们自己看看怎么设计比较好

github-actions[bot] commented 5 months ago

This issue is stale because it has been open for 60 days with no activity. 当前issue 60天内无活动,被标记为stale。

github-actions[bot] commented 5 months ago

This issue was closed because it has been inactive for 14 days since being marked as stale. 当前issue 被标记为stale已有14天,即将关闭。