Closed ogencoglu closed 4 years ago
Hi @ogencoglu thanks for your post! We are using the uncased version of BERT large. So you need to set do_lower_case
to True
!
>>> tokenizer = AutoTokenizer.from_pretrained("digitalepidemiologylab/covid-twitter-bert", do_lower_case=True)
>>> tokenizer.tokenize('She is cool!')
['she', 'is', 'cool', '!']
Let me know if it works!
Thanks for quick reply. My lack of attention :).
Thanks a lot for the nice work!
What would be the logic behind masking pronouns with an unknown token of [UNK]. This seems to be a major deviation from standard BERT models.
For example:
outputs
['[UNK]', 'is', 'cool', '!']
while
outputs
['She', 'is', 'cool', '!']