Bug fixed when tokenizer has missing 'pad_token'

HHousen / TransformerSum

Models to perform neural summarization (extractive and abstractive) using machine learning transformers and a tool to convert abstractive summarization datasets to the extractive task.

https://transformersum.rtfd.io

GNU General Public License v3.0

429 stars 58 forks source link

Bug fixed when tokenizer has missing 'pad_token' #66

Closed Aktsvigun closed 2 years ago

Aktsvigun commented 2 years ago

When self.tokenizer.pad_token is None, self.tokenizer.pad_token_id will also be None (to ensure, one can check on AutoTokenizer.from_pretrained('sberbank-ai/rugpt3small_based_on_gpt2')). Consequently, this will lead to errors (e.g. in lines 171, 175 in abstractive.py) when using self.tokenizer.pad_token_id.

Added a small fix that will add pad_token to the tokenizer (tokenizer.pad_token_id and tokenizer.special_tokens_map update automatically).

HHousen commented 2 years ago

Thank you for fixing this issue!