Ktrain Bi-Lstm Bert NER (SciBert and BioBert) cased models, preprocessing text to lowercase

@amaiya :Thanks for sharing the info on Scibert Cased model tokenizer config issue. #422 Just to confirm should i use your workaround like this:

TDATA = 'train2.txt'
VDATA = 'test2.txt'
(trn, val, preproc) = text.entities_from_conll2003(TDATA, val_filepath=VDATA)

from transformers import AutoTokenizer
preproc.p.te.tokenizer = AutoTokenizer.from_pretrained('allenai/scibert_scivocab_cased', do_lower_case=False)

WV_URL = 'https://dl.fbaipublicfiles.com/fasttext/vectors-crawl/cc.en.300.vec.gz'
model = text.sequence_tagger('bilstm-bert', preproc, 
                            bert_model='allenai/scibert_scivocab_cased', wv_path_or_url=WV_URL)

Is this how the Preproc tokennizer to be intialized, befor running learner code .? I think i'm doing something wrong, i'm not sure how to pass the option do_lower_case=False.

amaiya / ktrain

Ktrain Bi-Lstm Bert NER (SciBert and BioBert) cased models, preprocessing text to lowercase #425