Hi @datquocnguyen ,
Both PhoBERT-base Tokenizer and PhoBERT-large Tokenizer have the same vocab size = 64001. Then the question is can we use PhoBERT-base Tokenizer for PhoBERT-large model and vice versa? It means can we use only one of them to tokenize the dataset and use that prepared tokenized tensor dataset to do fine-tuning on downstream tasks for both PhoBERT-base Tokenizer and PhoBERT-large to save preparing time :)
Hi @datquocnguyen , Both PhoBERT-base Tokenizer and PhoBERT-large Tokenizer have the same vocab size = 64001. Then the question is can we use PhoBERT-base Tokenizer for PhoBERT-large model and vice versa? It means can we use only one of them to tokenize the dataset and use that prepared tokenized tensor dataset to do fine-tuning on downstream tasks for both PhoBERT-base Tokenizer and PhoBERT-large to save preparing time :)