第五步Train the whole model with RL时，没有ids_to_tokens

RowitZou / topic-dialog-summ

AAAI-2021 paper: Topic-Oriented Spoken Dialogue Summarization for Customer Service with Saliency-Aware Topic Modeling.

MIT License

77 stars 9 forks source link

第五步Train the whole model with RL时，没有ids_to_tokens #20

Closed yifanjun closed 2 years ago

yifanjun commented 2 years ago

可能我问了个很蠢的问题，但我确实不知道怎么处理。在进行第五步训练时，会报错，说 "models/rl.model.py"第499行： AttributeError: 'collections.OrderedDict' object has no attribute 'ids_to_tokens' 不太明白是不是因为版本原因。除了pytorch是1.9.0+cu111，其他的包我尝试都换成readme中的版本。请问有人遇到过这个问题么？

vkmu commented 2 years ago

我的transformers版本是4.15.0。通过查看BertTokenizer类的__init__()源码可知，ids_to_tokens不在vocab里面。

self.vocab = load_vocab(vocab_file)
self.ids_to_tokens = collections.OrderedDict([(ids, tok) for tok, ids in self.vocab.items()])

RowitZou commented 2 years ago

可能我问了个很蠢的问题，但我确实不知道怎么处理。在进行第五步训练时，会报错，说 "models/rl.model.py"第499行： AttributeError: 'collections.OrderedDict' object has no attribute 'ids_to_tokens' 不太明白是不是因为版本原因。除了pytorch是1.9.0+cu111，其他的包我尝试都换成readme中的版本。请问有人遇到过这个问题么？

能提供详细的Error Log吗？

yifanjun commented 2 years ago

我的transformers版本是4.15.0。通过查看BertTokenizer类的__init__()源码可知，ids_to_tokens不在vocab里面。
self.vocab = load_vocab(vocab_file)
self.ids_to_tokens = collections.OrderedDict([(ids, tok) for tok, ids in self.vocab.items()])

感谢，很有用。下面给出具体方案：在src/models/rl_model.py中，做出以下修改：

开头添加from nltk import collections
__inti__()方法中，在self.vocab=vocab之后添加self.ids_to_tokens=collections.OrderedDict([(ids, tok) for tok, ids in self.vocab.items()])
第499行，修改为words = [self.ids_to_tokens[w] for w in words]