Open vincent317 opened 3 months ago
Hello team,
When I do:
from transformers import AutoTokenizer pretrained_model = "EleutherAI/pythia-160m" tokenizer = AutoTokenizer.from_pretrained( pretrained_model, padding_side="left", cache_dir=pretrained_model+'_tokenizer', ) print(tokenizer.pad_token)
It seems like the pad_token is empty (None is printed).
pad_token
None
tokenizer.pad_token = tokenizer.eos_token seems fixing the issue. Is this the same way to apply padding token as in the training process?
tokenizer.pad_token = tokenizer.eos_token
Thank you!
Yes I think so. You can just set the pad_token = eos_token during training.
pad_token = eos_token
Hello team,
When I do:
It seems like the
pad_token
is empty (None
is printed).tokenizer.pad_token = tokenizer.eos_token
seems fixing the issue. Is this the same way to apply padding token as in the training process?Thank you!