Given tokenization will be present in all use-cases of NLP models, it would be efficient to have it set to True and enabled by default, as well as a SpaCy tokenizer since it would provide a more generalized model (used for indicating words such as he's and he is, they're and they are, and I'm and I am are the same for the model). These tokenizers are used in all use-cases hence it would provide a more efficient and enjoyable experience when using the models given having them built-in.
Furthermore, tokenizers for non ascii letters, numbers, and acronyms (such as idk,tbh,rn etc.) would pose as additional tokenization features, which can be used as additional bool params inside the parser.
Given tokenization will be present in all use-cases of NLP models, it would be efficient to have it set to True and enabled by default, as well as a SpaCy tokenizer since it would provide a more generalized model (used for indicating words such as he's and he is, they're and they are, and I'm and I am are the same for the model). These tokenizers are used in all use-cases hence it would provide a more efficient and enjoyable experience when using the models given having them built-in.
Furthermore, tokenizers for non ascii letters, numbers, and acronyms (such as idk,tbh,rn etc.) would pose as additional tokenization features, which can be used as additional bool params inside the parser.