JohnSnowLabs / nlu

1 line for thousands of State of The Art NLP models in hundreds of languages The fastest and most accurate way to solve text problems.
Apache License 2.0
864 stars 130 forks source link

Multilingual DeBERTa Transformer Embeddings for 100+ Languages, Spanish Clinical Trials - John Snow Labs NLU 3.4.2 #105

Closed C-K-Loan closed 2 years ago

C-K-Loan commented 2 years ago

Multilingual DeBERTa Transformer Embeddings for 100+ Languages, Spanish Deidentification and NER for Randomized Clinical Trials - John Snow Labs NLU 3.4.2

We are very excited NLU 3.4.2 has been released.

On the open source side we have 5 new DeBERTa Transformer models for English and Multi-Lingual for 100+ languages. DeBERTa improves over BERT and RoBERTa by introducing two novel techniques.

For the healthcare side we have new NER models for randomized clinical trials (RCT) which can detect entities of type BACKGROUND, CONCLUSIONS, METHODS, OBJECTIVE, RESULTS from clinical text. Additionally, new Spanish Deidentification NER models for entities like STATE, PATIENT, DEVICE, COUNTRY, ZIP, PHONE, HOSPITAL and many more.

New Open Source Models

Integrates models from the amazing Spark NLP 3.4.2 release

Language NLU Reference Spark NLP Reference Task Annotator Class
en en.embed.deberta_v3_xsmall deberta_v3_xsmall Embeddings DeBertaEmbeddings
en en.embed.deberta_v3_small deberta_v3_small Embeddings DeBertaEmbeddings
en en.embed.deberta_v3_base deberta_v3_base Embeddings DeBertaEmbeddings
en en.embed.deberta_v3_large deberta_v3_large Embeddings DeBertaEmbeddings
xx xx.embed.mdeberta_v3_base mdeberta_v3_base Embeddings DeBertaEmbeddings

New Healthcare Models

Integrates models from the incredible Spark NLP For Healthcare 3.4.2 release

Language NLU Reference Spark NLP Reference Task Annotator Class
en en.med_ner.clinical_trials bert_sequence_classifier_rct_biobert Text Classification MedicalBertForSequenceClassification
es es.med_ner.deid.generic.roberta ner_deid_generic_roberta_augmented De-identification MedicalNerModel
es es.med_ner.deid.subentity.roberta ner_deid_subentity_roberta_augmented De-identification MedicalNerModel
en en.med_ner.deid.generic_augmented ner_deid_generic_augmented ['Named Entity Recognition', 'De-identification'] MedicalNerModel
en en.med_ner.deid.subentity_augmented ner_deid_subentity_augmented ['Named Entity Recognition', 'De-identification'] MedicalNerModel

Additional NLU resources

1 line Install NLU on Google Colab

!wget https://setup.johnsnowlabs.com/nlu/colab.sh -O - | bash

1 line Install NLU on Kaggle

!wget https://setup.johnsnowlabs.com/nlu/kaggle.sh -O - | bash

Install via PIP

! pip install nlu pyspark streamlit==0.80.0