Is finBert cased or uncased?

ProsusAI / finBERT

Financial Sentiment Analysis with BERT

Apache License 2.0

1.45k stars 417 forks source link

I am not the originator of this code but I figured this out yesterday by looking into finbert/finbert.py.

The tokenizer is instantiated in FinBert.prepare_model as self.tokenizer = BertTokenizer.from_pretrained("bert-base-uncased", do_lower_case=self.config.do_lower_case) and the init function of the Config class uses do_lower_case=True, as default which does lower casing pre-processing on the input text as would be required for an uncased model.

So it uses bert_base_uncased as the tokenizer/vocabulary and lowercases the input, so it cannot tell the difference between lower and upper case.

ProsusAI / finBERT

Is finBert cased or uncased? #23