JohnSnowLabs / nlu

1 line for thousands of State of The Art NLP models in hundreds of languages The fastest and most accurate way to solve text problems.
Apache License 2.0
854 stars 130 forks source link

22 New models for 23 languages including various African and Indian languages, Medical Spanish models and more in NLU 3.4.1 #101

Closed C-K-Loan closed 2 years ago

C-K-Loan commented 2 years ago

22 New models for 23 languages including various African and Indian languages,

We are very excited to announce the release of NLU 3.4.1 which features 22 new models for 23 languages where the The open-source side covers new Embeddings for Vietnamese and English Clinical domains and Multilingual Embeddings for 12 Indian and 9 African Languages. Additionally, there are new Sequence classifiers for Multilingual NER for 9 African languages, German Sentiment Classifiers and English Emotion and Typo Classifiers. The healthcare side covers Medical Spanish models, Classifiers for Drugs, Gender, the Pico Framework, and Relation Extractors for Adverse Drug events and Temporality. Finally, Spark 3.2.X is now supported and bugs related to Databricks environments have been fixed.

General NLU Improvements

New Open Source Models

Based on the amazing 3.4.1 Spark NLP Release integrates new Multilingual embeddings for 12 Major Indian languages, embeddings for Vietnamese, French, and English Clinical domains. Additionally new Multilingual NER model for 9 African languages, English 6 Class Emotion classifier and Typo detectors.

New Embeddings

New Transformer based Token and Sequence Classifiers

Language NLU Reference Spark NLP Reference Task Annotator Class
xx xx.embed.albert.indic albert_indic Embeddings AlbertEmbeddings
xx xx.ner.masakhaner.distilbert xlm_roberta_large_token_classifier_masakhaner Named Entity Recognition DistilBertForTokenClassification
en en.embed.longformer.clinical clinical_longformer Embeddings LongformerEmbeddings
en en.classify.emotion.bert bert_sequence_classifier_emotion Text Classification BertForSequenceClassification
de de.classify.news_sentiment.bert bert_sequence_classifier_news_sentiment Sentiment Analysis BertForSequenceClassification
en en.classify.typos.distilbert distilbert_token_classifier_typo_detector Named Entity Recognition DistilBertForTokenClassification
fr fr.embed.word2vec_wiki_1000 word2vec_wiki_1000 Embeddings WordEmbeddingsModel
fr fr.embed.word2vec_wac_200 word2vec_wac_200 Embeddings WordEmbeddingsModel
fr fr.embed.w2v_cc_300d w2v_cc_300d Embeddings WordEmbeddingsModel
vi vi.embed.distilbert.cased distilbert_base_cased Embeddings DistilBertEmbeddings

New Healthcare Models

Integrated from the amazing 3.4.1 Spark NLP For Healthcare Release. which makes 2 new Annotator Classes available, MedicalBertForSequenceClassification and MedicalDistilBertForSequenceClassification, various medical Spanish models, RxNorm Resolvers, Transformer based sequence classifiers for Drugs, Gender and the PICO framework, and Relation extractors for Temporality and Causality of Drugs and Adverse Events.

New Medical Spanish Models

New Resolvers

New Transformer based Sequence Classifiers

New Relation Extractors

Language NLU Reference Spark NLP Reference Task Annotator Class
es es.embed.sciwiki_300d embeddings_sciwiki_300d Embeddings WordEmbeddingsModel
es es.med_ner.deid.generic ner_deid_generic De-identification MedicalNerModel
es es.med_ner.deid.subentity ner_deid_subentity De-identification MedicalNerModel
en en.med_ner.supplement_clinical ner_supplement_clinical Named Entity Recognition MedicalNerModel
en en.resolve.rxnorm.augmented_re sbiobertresolve_rxnorm_augmented_re Entity Resolution SentenceEntityResolverModel
en en.classify.ade.seq_biobert bert_sequence_classifier_ade Text Classification MedicalBertForSequenceClassification
en en.classify.gender.seq_biobert bert_sequence_classifier_gender_biobert Text Classification MedicalBertForSequenceClassification
en en.classify.pico.seq_biobert bert_sequence_classifier_pico_biobert Text Classification MedicalBertForSequenceClassification
en en.classify.ade.seq_distilbert distilbert_sequence_classifier_ade Text Classification MedicalDistilBertForSequenceClassification
en en.relation.temporal_events_clinical re_temporal_events_clinical Relation Extraction RelationExtractionModel
en en.relation.adverse_drug_events.clinical re_ade_clinical Relation Extraction RelationExtractionModel
en en.relation.adverse_drug_events.clinical.biobert redl_ade_biobert Relation Extraction RelationExtractionDLModel

Bugfixes

Additional NLU resources

1 line Install NLU on Google Colab

!wget https://setup.johnsnowlabs.com/nlu/colab.sh -O - | bash

1 line Install NLU on Kaggle

!wget https://setup.johnsnowlabs.com/nlu/kaggle.sh -O - | bash

Install via PIP

! pip install nlu pyspark streamlit