1 line for thousands of State of The Art NLP models in hundreds of languages The fastest and most accurate way to solve text problems.
Apache License 2.0
854
stars
130
forks
source link
48 new Transformer based models in 9 new languages, including NER for Finance, Industry, Politcal Policies, COVID and Chemical Trials, various clinical and medical domains in Spanish and English and much more in NLU 3.3.1 #88
We are incredibly excited to announce NLU 3.3.1 has been released with 48 new models in 9 languages!
It comes with 2 new types of state-of-the-art models,distilBERT and BERT for sequence classification with various pretrained weights,
state-of-the-art bert based classifiers for problems in the domains of Finance, Sentiment Classification, Industry, News and much more.
One the healthcare side, NLU features 22 new models in for English and Spanish with
withEntity Resolver Models for LOINC, MeSH, NDC and SNOMED and UMLS Diseases,
NER models for Biomarkers, NIHSS-Guidelines, COVID Trials , Chemical Trials,
Bert based Token Classifier models for biological, genetical,cancer, cellular terms,
Bert for Sequence Classification models for clinical question vs statement classification
and finally Spanich Clinical NER and Resolver Models
Once again, we would like to thank our community for making another amazing release possible!
New state-of-the-art fine-tuned BERT models for Sequence Classification in English, French, German, Spanish, Japanese, Turkish, Russian, and multilingual languages.
DistilBertForSequenceClassification models in English, French and Urdu
Word2Vec models.
classify.distilbert_sequence.banking77 : Banking NER model trained on BANKING77 dataset, which provides a very fine-grained set of intents in a banking domain. It comprises 13,083 customer service queries labeled with 77 intents. It focuses on fine-grained single-domain intent detection. Can extract entities like activate_my_card, age_limit, apple_pay_or_google_pay, atm_support, automatic_top_up, balance_not_updated_after_bank_transfer, balance_not_updated_after_cheque_or_cash_deposit, beneficiary_not_allowed, cancel_transfer, card_about_to_expire, card_acceptance, card_arrival, card_delivery_estimate, card_linking, card_not_working, card_payment_fee_charged, card_payment_not_recognised, card_payment_wrong_exchange_rate, card_swallowed, cash_withdrawal_charge, cash_withdrawal_not_recognised, change_pin, compromised_card, contactless_not_working, country_support, declined_card_payment, declined_cash_withdrawal, declined_transfer, direct_debit_payment_not_recognised, disposable_card_limits, edit_personal_details, exchange_charge, exchange_rate, exchange_via_app, extra_charge_on_statement, failed_transfer, fiat_currency_support, get_disposable_virtual_card, get_physical_card, getting_spare_card, getting_virtual_card, lost_or_stolen_card, lost_or_stolen_phone, order_physical_card, passcode_forgotten, pending_card_payment, pending_cash_withdrawal, pending_top_up, pending_transfer, pin_blocked, receiving_money,
classify.distilbert_sequence.industry : Industry NER model which can extract entities like Advertising, Aerospace & Defense, Apparel Retail, Apparel, Accessories & Luxury Goods, Application Software, Asset Management & Custody Banks, Auto Parts & Equipment, Biotechnology, Building Products, Casinos & Gaming, Commodity Chemicals, Communications Equipment, Construction & Engineering, Construction Machinery & Heavy Trucks, Consumer Finance, Data Processing & Outsourced Services, Diversified Metals & Mining, Diversified Support Services, Electric Utilities, Electrical Components & Equipment, Electronic Equipment & Instruments, Environmental & Facilities Services, Gold, Health Care Equipment, Health Care Facilities, Health Care Services.
xx.classify.bert_sequence.sentiment : Multi-Lingual Sentiment Classifier This a bert-base-multilingual-uncased model finetuned for sentiment analysis on product reviews in six languages: English, Dutch, German, French, Spanish and Italian. It predicts the sentiment of the review as a number of stars (between 1 and 5). This model is intended for direct use as a sentiment analysis model for product reviews in any of the six languages above, or for further finetuning on related sentiment analysis tasks.
distilbert_sequence.policy : Policy Classifier This model was trained on 129.669 manually annotated sentences to classify text into one of seven political categories: ‘Economy’, ‘External Relations’, ‘Fabric of Society’, ‘Freedom and Democracy’, ‘Political System’, ‘Welfare and Quality of Life’ or ‘Social Groups’.
classify.bert_sequence.dehatebert_mono : Hate Speech Classifier This model was trained on 129.669 manually annotated sentences to classify text into one of seven political categories: ‘Economy’, ‘External Relations’, ‘Fabric of Society’, ‘Freedom and Democracy’, ‘Political System’, ‘Welfare and Quality of Life’ or ‘Social Groups’.
ner_nihss : NER model that can identify entities according to NIHSS guidelines for clinical stroke assessment to evaluate neurological status in acute stroke patients
redl_nihss_biobert : relation extraction model that can relate scale items and their measurements according to NIHSS guidelines.
es.med_ner.roberta_ner_diag_proc : New Spanish Clinical NER Models for extracting the entities DIAGNOSTICO, PROCEDIMIENTO
es.resolve.snomed: New Spanish SNOMED Entity Resolvers
bert_sequence_classifier_question_statement_clinical:New Clinical Question vs Statement for BertForSequenceClassification model
med_ner.covid_trials : This model is trained to extract covid-specific medical entities in clinical trials. It supports the following entities ranging from virus type to trial design: Stage, Severity, Virus, Trial_Design, Trial_Phase, N_Patients, Institution, Statistical_Indicator, Section_Header, Cell_Type, Cellular_component, Viral_components, Physiological_reaction, Biological_molecules, Admission_Discharge, Age, BMI, Cerebrovascular_Disease, Date, Death_Entity, Diabetes, Disease_Syndrome_Disorder, Dosage, Drug_Ingredient, Employment, Frequency, Gender, Heart_Disease, Hypertension, Obesity, Pulse, Race_Ethnicity, Respiration, Route, Smoking, Time, Total_Cholesterol, Treatment, VS_Finding, Vaccine .
med_ner.chemd : This model extract the names of chemical compounds and drugs in medical texts. The entities that can be detected are as follows : SYSTEMATIC, IDENTIFIERS, FORMULA, TRIVIAL, ABBREVIATION, FAMILY, MULTIPLE . For reference click here . https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4331685/
bert_token_classifier_ner_bionlp : This model is BERT-based version of ner_bionlp model and can detect biological and genetics terms in cancer-related texts. (Amino_acid, Anatomical_system, Cancer, Cell, Cellular_component, Developing_anatomical_Structure, Gene_or_gene_product, Immaterial_anatomical_entity, Multi-tissue_structure, Organ, Organism, Organism_subdivision, Simple_chemical, Tissue
bert_token_classifier_ner_cellular : This model is BERT-based version of ner_cellular model and can detect molecular biology-related terms (DNA, Cell_type, Cell_line, RNA, Protein) in medical texts.
We have updated med_ner.jsl.enriched model by enriching the training data using clinical trials data to make it more robust. This model is capable of predicting up to 87 different entities and is based on ner_jsl model. Here are the entities this model can detect; Social_History_Header, Oncology_Therapy, Blood_Pressure, Respiration, Performance_Status, Family_History_Header, Dosage, Clinical_Dept, Diet, Procedure, HDL, Weight, Admission_Discharge, LDL, Kidney_Disease, Oncological, Route, Imaging_Technique, Puerperium, Overweight, Temperature, Diabetes, Vaccine, Age, Test_Result, Employment, Time, Obesity, EKG_Findings, Pregnancy, Communicable_Disease, BMI, Strength, Tumor_Finding, Section_Header, RelativeDate, ImagingFindings, Death_Entity, Date, Cerebrovascular_Disease, Treatment, Labour_Delivery, Pregnancy_Delivery_Puerperium, Direction, Internal_organ_or_component, Psychological_Condition, Form, Medical_Device, Test, Symptom, Disease_Syndrome_Disorder, Staging, Birth_Entity, Hyperlipidemia, O2_Saturation, Frequency, External_body_part_or_region, Drug_Ingredient, Vital_Signs_Header, Substance_Quantity, Race_Ethnicity, VS_Finding, Injury_or_Poisoning, Medical_History_Header, Alcohol, Triglycerides, Total_Cholesterol, Sexually_Active_or_Sexual_Orientation, Female_Reproductive_Status, Relationship_Status, Drug_BrandName, RelativeTime, Duration, Hypertension, Metastasis, Gender, Oxygen_Therapy, Pulse, Heart_Disease, Modifier, Allergen, Smoking, Substance, Cancer_Modifier, Fetus_NewBorn, Height
classify.bert_sequence.question_statement_clinical : This model classifies sentences into one of these two classes: question (interrogative sentence) or statement (declarative sentence) and trained with BertForSequenceClassification. This model is at first trained on SQuAD and SPAADIA dataset and then fine tuned on the clinical visit documents and MIMIC-III dataset annotated in-house. Using this model, you can find the question statements and exclude & utilize in the downstream tasks such as NER and relation extraction models.
classify.token_bert.ner_chemical : This model is BERT-based version of ner_chemicals model and can detect chemical compounds (CHEM) in the medical texts.
resolve.umls_disease_syndrome : This model is trained on the Disease or Syndrome category using sbiobert_base_cased_mli embeddings.
We are incredibly excited to announce NLU 3.3.1 has been released with 48 new models in 9 languages!
It comes with 2 new types of state-of-the-art models,
distilBERT
andBERT for sequence classification
with various pretrained weights, state-of-the-art bert based classifiers for problems in the domains ofFinance
,Sentiment Classification
,Industry
,News
and much more.One the healthcare side, NLU features 22 new models in for English and Spanish with with
Entity Resolver Models
for LOINC, MeSH, NDC and SNOMED and UMLS Diseases,NER models
forBiomarkers
,NIHSS-Guidelines
,COVID Trials
,Chemical Trials
,Bert based Token Classifier models
forbiological
,genetical
,cancer
,cellular
terms,Bert for Sequence Classification models
forclinical question vs statement classification
and finallySpanich Clinical NER
andResolver Models
Once again, we would like to thank our community for making another amazing release possible!
New Open Sourcen Models and Features
Integrates the amazing Spark NLP 3.3.3 and 3.3.2 releases, featuring:
BERT models for Sequence Classification
inEnglish
,French
,German
,Spanish
,Japanese
,Turkish
,Russian
, and multilingual languages.DistilBertForSequenceClassification
models inEnglish
,French
andUrdu
Word2Vec
models.classify.distilbert_sequence.banking77
:Banking NER model
trained on BANKING77 dataset, which provides a very fine-grained set of intents in a banking domain. It comprises 13,083 customer service queries labeled with 77 intents. It focuses on fine-grained single-domain intent detection. Can extract entities like activate_my_card, age_limit, apple_pay_or_google_pay, atm_support, automatic_top_up, balance_not_updated_after_bank_transfer, balance_not_updated_after_cheque_or_cash_deposit, beneficiary_not_allowed, cancel_transfer, card_about_to_expire, card_acceptance, card_arrival, card_delivery_estimate, card_linking, card_not_working, card_payment_fee_charged, card_payment_not_recognised, card_payment_wrong_exchange_rate, card_swallowed, cash_withdrawal_charge, cash_withdrawal_not_recognised, change_pin, compromised_card, contactless_not_working, country_support, declined_card_payment, declined_cash_withdrawal, declined_transfer, direct_debit_payment_not_recognised, disposable_card_limits, edit_personal_details, exchange_charge, exchange_rate, exchange_via_app, extra_charge_on_statement, failed_transfer, fiat_currency_support, get_disposable_virtual_card, get_physical_card, getting_spare_card, getting_virtual_card, lost_or_stolen_card, lost_or_stolen_phone, order_physical_card, passcode_forgotten, pending_card_payment, pending_cash_withdrawal, pending_top_up, pending_transfer, pin_blocked, receiving_money,classify.distilbert_sequence.industry
:Industry NER model
which can extract entities like Advertising, Aerospace & Defense, Apparel Retail, Apparel, Accessories & Luxury Goods, Application Software, Asset Management & Custody Banks, Auto Parts & Equipment, Biotechnology, Building Products, Casinos & Gaming, Commodity Chemicals, Communications Equipment, Construction & Engineering, Construction Machinery & Heavy Trucks, Consumer Finance, Data Processing & Outsourced Services, Diversified Metals & Mining, Diversified Support Services, Electric Utilities, Electrical Components & Equipment, Electronic Equipment & Instruments, Environmental & Facilities Services, Gold, Health Care Equipment, Health Care Facilities, Health Care Services.xx.classify.bert_sequence.sentiment
:Multi-Lingual Sentiment Classifier
This a bert-base-multilingual-uncased model finetuned for sentiment analysis on product reviews in six languages: English, Dutch, German, French, Spanish and Italian. It predicts the sentiment of the review as a number of stars (between 1 and 5). This model is intended for direct use as a sentiment analysis model for product reviews in any of the six languages above, or for further finetuning on related sentiment analysis tasks.distilbert_sequence.policy
:Policy Classifier
This model was trained on 129.669 manually annotated sentences to classify text into one of seven political categories: ‘Economy’, ‘External Relations’, ‘Fabric of Society’, ‘Freedom and Democracy’, ‘Political System’, ‘Welfare and Quality of Life’ or ‘Social Groups’.classify.bert_sequence.dehatebert_mono
:Hate Speech Classifier
This model was trained on 129.669 manually annotated sentences to classify text into one of seven political categories: ‘Economy’, ‘External Relations’, ‘Fabric of Society’, ‘Freedom and Democracy’, ‘Political System’, ‘Welfare and Quality of Life’ or ‘Social Groups’.Complete List of Open Source Models :
New Healthcare models and Features
Integrates the incredible Spark NLP for Healthcare releases 3.3.4, 3.3.2 and 3.3.1, featuring:
ner_biomarker
for extracting extract biomarkers, therapies, oncological, and other general conceptsner_nihss
: NER model that can identify entities according to NIHSS guidelines for clinical stroke assessment to evaluate neurological status in acute stroke patientsredl_nihss_biobert
: relation extraction model that can relate scale items and their measurements according to NIHSS guidelines.es.med_ner.roberta_ner_diag_proc
: New Spanish Clinical NER Models for extracting the entities DIAGNOSTICO, PROCEDIMIENTOes.resolve.snomed
: New Spanish SNOMED Entity Resolversbert_sequence_classifier_question_statement_clinical
:New Clinical Question vs Statement for BertForSequenceClassification modelmed_ner.covid_trials
: This model is trained to extract covid-specific medical entities in clinical trials. It supports the following entities ranging from virus type to trial design: Stage, Severity, Virus, Trial_Design, Trial_Phase, N_Patients, Institution, Statistical_Indicator, Section_Header, Cell_Type, Cellular_component, Viral_components, Physiological_reaction, Biological_molecules, Admission_Discharge, Age, BMI, Cerebrovascular_Disease, Date, Death_Entity, Diabetes, Disease_Syndrome_Disorder, Dosage, Drug_Ingredient, Employment, Frequency, Gender, Heart_Disease, Hypertension, Obesity, Pulse, Race_Ethnicity, Respiration, Route, Smoking, Time, Total_Cholesterol, Treatment, VS_Finding, Vaccine .med_ner.chemd
: This model extract the names of chemical compounds and drugs in medical texts. The entities that can be detected are as follows : SYSTEMATIC, IDENTIFIERS, FORMULA, TRIVIAL, ABBREVIATION, FAMILY, MULTIPLE . For reference click here . https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4331685/bert_token_classifier_ner_bionlp
: This model is BERT-based version of ner_bionlp model and can detect biological and genetics terms in cancer-related texts. (Amino_acid, Anatomical_system, Cancer, Cell, Cellular_component, Developing_anatomical_Structure, Gene_or_gene_product, Immaterial_anatomical_entity, Multi-tissue_structure, Organ, Organism, Organism_subdivision, Simple_chemical, Tissuebert_token_classifier_ner_cellular
: This model is BERT-based version of ner_cellular model and can detect molecular biology-related terms (DNA, Cell_type, Cell_line, RNA, Protein) in medical texts.med_ner.jsl.enriched
model by enriching the training data using clinical trials data to make it more robust. This model is capable of predicting up to 87 different entities and is based on ner_jsl model. Here are the entities this model can detect; Social_History_Header, Oncology_Therapy, Blood_Pressure, Respiration, Performance_Status, Family_History_Header, Dosage, Clinical_Dept, Diet, Procedure, HDL, Weight, Admission_Discharge, LDL, Kidney_Disease, Oncological, Route, Imaging_Technique, Puerperium, Overweight, Temperature, Diabetes, Vaccine, Age, Test_Result, Employment, Time, Obesity, EKG_Findings, Pregnancy, Communicable_Disease, BMI, Strength, Tumor_Finding, Section_Header, RelativeDate, ImagingFindings, Death_Entity, Date, Cerebrovascular_Disease, Treatment, Labour_Delivery, Pregnancy_Delivery_Puerperium, Direction, Internal_organ_or_component, Psychological_Condition, Form, Medical_Device, Test, Symptom, Disease_Syndrome_Disorder, Staging, Birth_Entity, Hyperlipidemia, O2_Saturation, Frequency, External_body_part_or_region, Drug_Ingredient, Vital_Signs_Header, Substance_Quantity, Race_Ethnicity, VS_Finding, Injury_or_Poisoning, Medical_History_Header, Alcohol, Triglycerides, Total_Cholesterol, Sexually_Active_or_Sexual_Orientation, Female_Reproductive_Status, Relationship_Status, Drug_BrandName, RelativeTime, Duration, Hypertension, Metastasis, Gender, Oxygen_Therapy, Pulse, Heart_Disease, Modifier, Allergen, Smoking, Substance, Cancer_Modifier, Fetus_NewBorn, Heightclassify.bert_sequence.question_statement_clinical
: This model classifies sentences into one of these two classes: question (interrogative sentence) or statement (declarative sentence) and trained with BertForSequenceClassification. This model is at first trained on SQuAD and SPAADIA dataset and then fine tuned on the clinical visit documents and MIMIC-III dataset annotated in-house. Using this model, you can find the question statements and exclude & utilize in the downstream tasks such as NER and relation extraction models.classify.token_bert.ner_chemical
: This model is BERT-based version of ner_chemicals model and can detect chemical compounds (CHEM) in the medical texts.resolve.umls_disease_syndrome
: This model is trained on the Disease or Syndrome category using sbiobert_base_cased_mli embeddings.Complete List of Healthcare Models :