R1j1t / contextualSpellCheck

✔️Contextual word checker for better suggestions
MIT License
405 stars 56 forks source link

Another BERT, BioBERT ? #62

Closed CamilleSchr closed 3 years ago

CamilleSchr commented 3 years ago

Hi, Thanks for this library ! Is it possible to change BERT to BioBERT ? I worked on scientific articles, and I would like to correct specific terms, like name of disease that the transcriptions were wrong. For instance, I have "covet 19" on the transcription, instead of "covid 19", and I would like to correct that. Can you help me please ? Thanks, Cheers, Camille

R1j1t commented 3 years ago

Hi @CamilleSchr, It is possible to pass most of the bert models via model_name tag example:

The model_name should be same as on huggingFace because it will download the respective file

>>> import contextualSpellCheck
>>> import spacy
>>> nlp = spacy.load("en_core_web_sm")
>>> #Example wih https://huggingface.co/dmis-lab/biobert-base-cased-v1.1
>>> nlp.add_pipe("contextual spellchecker", config={"model_name":"dmis-lab/biobert-base-cased-v1.1"})

>>> doc = nlp("Someone got tested for covet 19")
>>> doc._.outcome_spellCheck
'Someone got tested for Homes 19'

In the above example it corrected "covet" to "Homes", might need to look deeper into this, but you can try other models.

CamilleSchr commented 3 years ago

Ok, thank you so much !