Closed dzkb closed 2 years ago
Hi, the values of the context token and answer token are <context>
and <answer>
respectively as shown in the source code. You can check the special token values with:
from transformers import T5Tokenizer
tokenizer = T5Tokenizer.from_pretrained("iarfmoose/t5-base-question-generator")
print(tokenizer.get_added_vocab())
which should print the tokens and their ids like this:
{'<answer>': 32100, '<context>': 32101}
I'll update the model card on the huggingface hub to make it more clear.
Thank you!
Hi! Thank you for the great work on this model and accompanying code!
I've noticed that the model's page on huggingface contains instructions on preparing the input text. The description indicates that two special tokens,
answer_token
andcontext_token
, have to be used before the answer and the context respectively, but after browsing the code I've noticed that the question generation logic uses<answer>
and<context>
tokens. After initial tests it seems that in fact those<answer>
/<context>
tokens work correctly when used in generation.What are the correct tokens for the pretrained model available on huggingface?