Open galsk87 opened 3 years ago
Hi Thanks for the code and the models they are very useful!
I have an issue with the tokenizers in the cased version, it actually uses an uncased tokenizer, is that a mistake or intentional?
Code snippet: tokenizer = AutoTokenizer.from_pretrained('dmis-lab/biobert-base-cased-v1.1') tokenizer.pretrained_init_configuration
tokenizer = AutoTokenizer.from_pretrained('dmis-lab/biobert-base-cased-v1.1')
tokenizer.pretrained_init_configuration
{'bert-base-uncased': {'do_lower_case': True}, 'bert-large-uncased': {'do_lower_case': True}, 'bert-base-cased': {'do_lower_case': False}, 'bert-large-cased': {'do_lower_case': False}, 'bert-base-multilingual-uncased': {'do_lower_case': True}, 'bert-base-multilingual-cased': {'do_lower_case': False}, 'bert-base-chinese': {'do_lower_case': False}, 'bert-base-german-cased': {'do_lower_case': False}, 'bert-large-uncased-whole-word-masking': {'do_lower_case': True}, 'bert-large-cased-whole-word-masking': {'do_lower_case': False}, 'bert-large-uncased-whole-word-masking-finetuned-squad': {'do_lower_case': True}, 'bert-large-cased-whole-word-masking-finetuned-squad': {'do_lower_case': False}, 'bert-base-cased-finetuned-mrpc': {'do_lower_case': False}, 'bert-base-german-dbmdz-cased': {'do_lower_case': False}, 'bert-base-german-dbmdz-uncased': {'do_lower_case': True}, 'TurkuNLP/bert-base-finnish-cased-v1': {'do_lower_case': False}, 'TurkuNLP/bert-base-finnish-uncased-v1': {'do_lower_case': True}, 'wietsedv/bert-base-dutch-cased': {'do_lower_case': False}}
Hi Thanks for the code and the models they are very useful!
I have an issue with the tokenizers in the cased version, it actually uses an uncased tokenizer, is that a mistake or intentional?
Code snippet:
tokenizer = AutoTokenizer.from_pretrained('dmis-lab/biobert-base-cased-v1.1')
tokenizer.pretrained_init_configuration
{'bert-base-uncased': {'do_lower_case': True}, 'bert-large-uncased': {'do_lower_case': True}, 'bert-base-cased': {'do_lower_case': False}, 'bert-large-cased': {'do_lower_case': False}, 'bert-base-multilingual-uncased': {'do_lower_case': True}, 'bert-base-multilingual-cased': {'do_lower_case': False}, 'bert-base-chinese': {'do_lower_case': False}, 'bert-base-german-cased': {'do_lower_case': False}, 'bert-large-uncased-whole-word-masking': {'do_lower_case': True}, 'bert-large-cased-whole-word-masking': {'do_lower_case': False}, 'bert-large-uncased-whole-word-masking-finetuned-squad': {'do_lower_case': True}, 'bert-large-cased-whole-word-masking-finetuned-squad': {'do_lower_case': False}, 'bert-base-cased-finetuned-mrpc': {'do_lower_case': False}, 'bert-base-german-dbmdz-cased': {'do_lower_case': False}, 'bert-base-german-dbmdz-uncased': {'do_lower_case': True}, 'TurkuNLP/bert-base-finnish-cased-v1': {'do_lower_case': False}, 'TurkuNLP/bert-base-finnish-uncased-v1': {'do_lower_case': True}, 'wietsedv/bert-base-dutch-cased': {'do_lower_case': False}}