VinAIResearch / PhoBERT

PhoBERT: Pre-trained language models for Vietnamese (EMNLP-2020 Findings)
MIT License
636 stars 92 forks source link

Question about useFast=False when loading PhoBERT Tokenizer #43

Closed ithieund closed 1 year ago

ithieund commented 1 year ago

Hi @datquocnguyen , I remember you mentioned in some documents that when we use transformer v4, we should add useFast=False when loading the Tokenizer. Is that still true for now? As I couldn't find that document again and your latest README seems to be updated too.

Another question is: what is the difference between useFast = True and useFast = False in this case? Anything change in the output? Thank you very much.

datquocnguyen commented 1 year ago

"useFast = True" and "useFast = False" produce the same output tokenization. "useFast = True" (by default) would help run the latest examples in https://github.com/huggingface/transformers/tree/main/examples/pytorch and the likes with the fast PhoBERT Tokenizer (installation shown in the readme), while "useFast = False" will be used for examples available in https://github.com/huggingface/transformers/tree/main/examples/legacy

ithieund commented 1 year ago

Thank you.

On Sat, Nov 26, 2022 at 5:31 PM Dat Quoc Nguyen @.***> wrote:

"useFast = True" and "useFast = False" produce the same output tokenization. "useFast = True" (by default) would help run the latest examples in https://github.com/huggingface/transformers/tree/main/examples/pytorch and the likes with the fast PhoBERT Tokenizer (installation shown in the readme), while "useFast = False" will be used for examples available in https://github.com/huggingface/transformers/tree/main/examples/legacy

— Reply to this email directly, view it on GitHub https://github.com/VinAIResearch/PhoBERT/issues/43#issuecomment-1328021423, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABAFS22CGRKIHEXRLFA6YETWKHRJDANCNFSM6AAAAAASL4O5TI . You are receiving this because you authored the thread.Message ID: @.***>

-- Sent from my iPhone