fastnlp / fastNLP

fastNLP: A Modularized and Extensible NLP Framework. Currently still in incubation.
https://gitee.com/fastnlp/fastNLP
Apache License 2.0
3.05k stars 451 forks source link

Vocabulary #326

Closed el-psy closed 3 years ago

el-psy commented 3 years ago

Describe the bug 清晰而简要地描述bug Vocabulary中 vocab.idx2word = dict之后,再次打印vocab.idx2word会出现key和value互换的现象。。 fastNLP/core/vocabulary.py @idx2word.setter def idx2word(self, value): self._word2idx = value

To Reproduce 重现这个bug的步骤 from fastNLP import Vocabulary tag_vocab=Vocabulary() tag_vocab.idx2word={0: 'O', 1: 'B-equip', 2: 'I-equip', 3: 'B-time', 4: 'I-time', 5: 'B-place', 6: 'I-place', 7: 'B-number', 8: 'I-number', 9: 'B-task', 10: 'I-task', 11: 'B-unit', 12: 'I-unit', 13: 'B-warName', 14: 'I-warName'} print(tag_vocab) 结果: {'O': 0, 'B-equip': 1, 'I-equip': 2, 'B-time': 3, 'I-time': 4, 'B-place': 5, 'I-place': 6, 'B-number': 7, 'I-number': 8, 'B-task': 9, 'I-task': 10, 'B-unit': 11, 'I-unit': 12, 'B-warName': 13, 'I-warName': 14}

Expected behavior 清晰而简要地描述你期望的结果 {0: 'O', 1: 'B-equip', 2: 'I-equip', 3: 'B-time', 4: 'I-time', 5: 'B-place', 6: 'I-place', 7: 'B-number', 8: 'I-number', 9: 'B-task', 10: 'I-task', 11: 'B-unit', 12: 'I-unit', 13: 'B-warName', 14: 'I-warName'}

Additional context 备注 本来想偷懒直接设置tag_vocab的idx2word然后使用crf的。。。

yhcc commented 3 years ago

你可以通过

vocab = Vocabulary(unknown=None, padding=None)
for tag in {0: 'O', 1: 'B-equip', 2: 'I-equip', 3: 'B-time', 4: 'I-time', 5: 'B-place', 6: 'I-place', 7: 'B-number', 8: 'I-number', 9: 'B-task', 10: 'I-task', 11: 'B-unit', 12: 'I-unit', 13: 'B-warName', 14: 'I-warName'}.values():
  vocab.add_word(tag)

这样来初始化这个vocab,由于每个tag都只出现一次,所以vocab中index的顺序会与加入顺序一致。