EmilyAlsentzer / clinicalBERT

repository for Publicly Available Clinical BERT Embeddings
MIT License
658 stars 134 forks source link

Using NER for running HuggingFace Pipeline #22

Closed BrianThomasMcGrath closed 4 years ago

BrianThomasMcGrath commented 4 years ago

I tried using this model with HuggingFace's transformers.pipeline to establish baseline on doing NER some data that I have but I was running into index errors based off the fact the the id2label dictionary in the config for the model has only 2 labels in it currently, {0: 'LABEL_0', 1: 'LABEL_1'}. Do you have a full set of labels or should i go about getting these predictions in another way?

EmilyAlsentzer commented 4 years ago

We've exchanged over email, but posting here for the community:

Unfortunately, you won't be able to use the ClinicalBERT model out of the box without fine-tuning it on the NER task. We released the clinical BERT models that were pretrained via masked LM and NSP on MIMIC data, but did not release any of the models that we fine tuned on the i2b2 or MedNLI tasks.

You can easily finetune clinicalBERT on NER for the labels you care about. Finetuning ClinicalBERT on any downstream task is relatively cheap compared to the time it took us to do the pretraining on MIMIC. Check out the run_ner.py script in the Huggingface repo for an example on how to do this.