No proper encodings for covid-related terms

I have just checked encodings that autotokenizer produces. It seems that for words "wuhan", "ncov", "coronavirus", "covid", or "sars-cov-2" it produces more than one token, while tokenizer produces one token for 'conventional' words like apple. E.g.

from transformers import  AutoTokenizer
tokenizer =  AutoTokenizer.from_pretrained("digitalepidemiologylab/covid-twitter-bert-v2", do_lower_case=True)
tokenizer(['wuhan', "covid","coronavirus","sars-cov-2","apple","city"], truncation=True, padding=True, max_length=512)

Result:

{'input_ids': [[101, 8814, 4819, 102, 0, 0, 0, 0, 0], [101, 2522, 17258, 102, 0, 0, 0, 0, 0], [101, 21887, 23350, 102, 0, 0, 0, 0, 0], [101, 18906, 2015, 1011, 2522, 2615, 1011, 1016, 102], [101, 6207, 102, 0, 0, 0, 0, 0, 0], [101, 2103, 102, 0, 0, 0, 0, 0, 0]]}.

As you can see, there are two encoded values for 'wuhan', "covid","coronavirus" ([8814, 4819],[2522, 17258],[ 21887, 23350] accordingly), while one id for apple and city (as it should be - [ 6207] and [2103]).

I have also checked tokenizer dictionary (vocab.txt) from https://huggingface.co/digitalepidemiologylab/covid-twitter-bert-v2/tree/main and there are no such terms as "wuhan", "ncov", "coronavirus", "covid", or "sars-cov-2" (as mentioned in the readme - https://huggingface.co/digitalepidemiologylab/covid-twitter-bert-v2).

I wonder why model does not recognize covid-related terms and how do I make the model 'understand' these terms? It seems that poor performance of models in my specific case (web texts that mention covid only once) may be related to this issue

digitalepidemiologylab / covid-twitter-bert

No proper encodings for covid-related terms #21