JohnSnowLabs / nlu

1 line for thousands of State of The Art NLP models in hundreds of languages The fastest and most accurate way to solve text problems.
Apache License 2.0
841 stars 131 forks source link

4.0.0 #130

Closed C-K-Loan closed 2 years ago

C-K-Loan commented 2 years ago

OCR Visual Tables into Pandas DataFrames from PDF/DOC(X)/PPT files, 1000+ new state-of-the-art transformer models for Question Answering (QA) for over 30 languages, up to 700% speedup on GPU, 20 Biomedical models for over 8 languages, 50+ Terminology Code Mappers between RXNORM, NDC, UMLS,ICD10, ICDO, UMLS, SNOMED and MESH, Deidentification in Romanian, various Spark NLP helper methods and much more in 1 line of code with John Snow Labs NLU 4.0.0


NLU 4.0 for OCR Overview

On the OCR side, we now support extracting tables from PDF/DOC(X)/PPT files into structured pandas dataframe, making it easier than ever before to analyze bulks of files visually!

Checkout the OCR Tutorial for extracting Tables from Image/PDF/DOC(X) files Open In Colab to see this in action

These models grab all Table data from the files detected and return a list of Pandas DataFrames,
containing Pandas DataFrame for every table detected

NLU Spell Transformer Class
nlu.load(pdf2table) PdfToTextTable
nlu.load(ppt2table) PptToTextTable
nlu.load(doc2table) DocToTextTable

This is powerd by John Snow Labs Spark OCR Annotataors for PdfToTextTable, DocToTextTable, PptToTextTable


NLU 4.0 Core Overview


NLU 4.0 for Healthcare Overview


Extract Tables from PDF files as Pandas DataFrames

Sample PDF: Sample PDF

nlu.load('pdf2table').predict('/path/to/sample.pdf')  

Output of PDF Table OCR :

mpg cyl disp hp drat wt qsec vs am gear
21 6 160 110 3.9 2.62 16.46 0 1 4
21 6 160 110 3.9 2.875 17.02 0 1 4
22.8 4 108 93 3.85 2.32 18.61 1 1 4
21.4 6 258 110 3.08 3.215 19.44 1 0 3
18.7 8 360 175 3.15 3.44 17.02 0 0 3
13.3 8 350 245 3.73 3.84 15.41 0 0 3
19.2 8 400 175 3.08 3.845 17.05 0 0 3
27.3 4 79 66 4.08 1.935 18.9 1 1 4
26 4 120.3 91 4.43 2.14 16.7 0 1 5
30.4 4 95.1 113 3.77 1.513 16.9 1 1 5
15.8 8 351 264 4.22 3.17 14.5 0 1 5
19.7 6 145 175 3.62 2.77 15.5 0 1 5
15 8 301 335 3.54 3.57 14.6 0 1 5
21.4 4 121 109 4.11 2.78 18.6 1 1 4

Extract Tables from DOC/DOCX files as Pandas DataFrames

Sample DOCX: Sample DOCX

nlu.load('doc2table').predict('/path/to/sample.docx')  

Output of DOCX Table OCR :

Screen Reader Responses Share
JAWS 853 49%
NVDA 238 14%
Window-Eyes 214 12%
System Access 181 10%
VoiceOver 159 9%

Extract Tables from PPT files as Pandas DataFrame

Sample PPT with two tables: Sample PPT with two tables

nlu.load('ppt2table').predict('/path/to/sample.docx')  

Output of PPT Table OCR :

Sepal.Length Sepal.Width Petal.Length Petal.Width Species
5.1 3.5 1.4 0.2 setosa
4.9 3 1.4 0.2 setosa
4.7 3.2 1.3 0.2 setosa
4.6 3.1 1.5 0.2 setosa
5 3.6 1.4 0.2 setosa
5.4 3.9 1.7 0.4 setosa

and

Sepal.Length Sepal.Width Petal.Length Petal.Width Species
6.7 3.3 5.7 2.5 virginica
6.7 3 5.2 2.3 virginica
6.3 2.5 5 1.9 virginica
6.5 3 5.2 2 virginica
6.2 3.4 5.4 2.3 virginica
5.9 3 5.1 1.8 virginica

Span Classifiers for question answering

Albert, Bert, DeBerta, DistilBert, LongFormer, RoBerta, XlmRoBerta based Transformer Architectures are now avaiable for question answering with almost 1000 models avaiable for 35 unique languages powerd by their corrosponding Spark NLP XXXForQuestionAnswering Annotator Classes and in various tuning and dataset flavours.

<lang>.answer_question.<domain>.<datasets>.<annotator_class><tune info>.by_<username> If multiple datasets or tune parameters are defined , they are connected with a _ .

These substrings define up the <domain> part of the NLU reference

These substrings define up the <dataset> part of the NLU reference

These substrings define up the <dataset> part of the NLU reference

These substrings define the <annotator_class> substring, if it does not map to a sparknlp annotator

These substrings define the <tune_info> substring, if it does not map to a sparknlp annotator

QA DataFormat

You need to use one of the Data formats below to pass context and question correctly to the model.


# use ||| to seperate question||context
data = 'What is my name?|||My name is Clara and I live in Berkeley'

# pass a tuple (question,context)
data = ('What is my name?','My name is Clara and I live in Berkeley')

# use pandas Dataframe, one column = question, one column=context
data = pd.DataFrame({
                     'question': ['What is my name?'],
                     'context': ["My name is Clara and I live in Berkely"]
                     })

# Get your answers with any of above formats 
nlu.load("en.answer_question.squadv2.deberta").predict(data)

returns :

answer answer_confidence context question
Clara 0.994931 My name is Clara and I live in Berkely What is my name?

New NLU helper Methods

You can see all features showcased in the Open In Colab notebook or on the new docs page for Spark NLP utils

nlu.viz(pipe,data)

Visualize input data with an already configured Spark NLP pipeline,
for Algorithms of type (Ner,Assertion, Relation, Resolution, Dependency)
using Spark NLP Display
Automatically infers applicable viz type and output columns to use for visualization.
Example:

# works with Pipeline, LightPipeline, PipelineModel,PretrainedPipeline List[Annotator]
ade_pipeline = PretrainedPipeline('explain_clinical_doc_ade', 'en', 'clinical/models')

text = """I have an allergic reaction to vancomycin.
My skin has be itchy, sore throat/burning/itchy, and numbness in tongue and gums.
I would not recommend this drug to anyone, especially since I have never had such an adverse reaction to any other medication."""

nlu.viz(ade_pipeline, text)

returns:

If a pipeline has multiple models candidates that can be used for a viz,
the first Annotator that is vizzable will be used to create viz.
You can specify which type of viz to create with the viz_type parameter

Output columns to use for the viz are automatically deducted from the pipeline, by using the first annotator that provides the correct output type for a specific viz.
You can specify which columns to use for a viz by using the
corresponding ner_col, pos_col, dep_untyped_col, dep_typed_col, resolution_col, relation_col, assertion_col, parameters.

nlu.autocomplete_pipeline(pipe)

Auto-Complete a pipeline or single annotator into a runnable pipeline by harnessing NLU's DAG Autocompletion algorithm and returns it as NLU pipeline. The standard Spark pipeline is avaiable on the .vanilla_transformer_pipe attribute of the returned nlu pipe

Every Annotator and Pipeline of Annotators defines a DAG of tasks, with various dependencies that must be satisfied in topoligical order. NLU enables the completion of an incomplete DAG by finding or creating a path between the very first input node which is almost always is DocumentAssembler/MultiDocumentAssembler and the very last node(s), which is given by the topoligical sorting the iterable annotators parameter. Paths are created by resolving input features of annotators to the corrrosponding providers with matching storage references.

Example:

# Lets autocomplete the pipeline for a RelationExtractionModel, which as many input columns and sub-dependencies.
from sparknlp_jsl.annotator import RelationExtractionModel
re_model = RelationExtractionModel().pretrained("re_ade_clinical", "en", 'clinical/models').setOutputCol('relation')

text = """I have an allergic reaction to vancomycin.
My skin has be itchy, sore throat/burning/itchy, and numbness in tongue and gums.
I would not recommend this drug to anyone, especially since I have never had such an adverse reaction to any other medication."""

nlu_pipe = nlu.autocomplete_pipeline(re_model)
nlu_pipe.predict(text)

returns :

relation relation_confidence relation_entity1 relation_entity2 relation_entity2_class
1 1 allergic reaction vancomycin Drug_Ingredient
1 1 skin itchy Symptom
1 0.99998 skin sore throat/burning/itchy Symptom
1 0.956225 skin numbness Symptom
1 0.999092 skin tongue External_body_part_or_region
0 0.942927 skin gums External_body_part_or_region
1 0.806327 itchy sore throat/burning/itchy Symptom
1 0.526163 itchy numbness Symptom
1 0.999947 itchy tongue External_body_part_or_region
0 0.994618 itchy gums External_body_part_or_region
0 0.994162 sore throat/burning/itchy numbness Symptom
1 0.989304 sore throat/burning/itchy tongue External_body_part_or_region
0 0.999969 sore throat/burning/itchy gums External_body_part_or_region
1 1 numbness tongue External_body_part_or_region
1 1 numbness gums External_body_part_or_region
1 1 tongue gums External_body_part_or_region

nlu.to_pretty_df(pipe,data)

Annotates a Pandas Dataframe/Pandas Series/Numpy Array/Spark DataFrame/Python List strings /Python String
with given Spark NLP pipeline, which is assumed to be complete and runnable and returns it in a pythonic pandas dataframe format.

Example:

# works with Pipeline, LightPipeline, PipelineModel,PretrainedPipeline List[Annotator]
ade_pipeline = PretrainedPipeline('explain_clinical_doc_ade', 'en', 'clinical/models')

text = """I have an allergic reaction to vancomycin.
My skin has be itchy, sore throat/burning/itchy, and numbness in tongue and gums.
I would not recommend this drug to anyone, especially since I have never had such an adverse reaction to any other medication."""

# output is same as nlu.autocomplete_pipeline(re_model).nlu_pipe.predict(text)
nlu.to_pretty_df(ade_pipeline,text)

returns :

assertion asserted_entitiy entitiy_class assertion_confidence
present allergic reaction ADE 0.998
present itchy ADE 0.8414
present sore throat/burning/itchy ADE 0.9019
present numbness in tongue and gums ADE 0.9991

Annotators are grouped internally by NLU into output levels token,sentence, document,chunk and relation Same level annotators output columns are zipped and exploded together to create the final output df. Additionally, most keys from the metadata dictionary in the result annotations will be collected and expanded into their own columns in the resulting Dataframe, with special handling for Annotators that encode multiple metadata fields inside of one, seperated by strings like ||| or :::. Some columns are omitted from metadata to reduce total amount of output columns, these can be re-enabled by setting metadata=True

For a given pipeline output level is automatically set to the last anntators output level by default. This can be changed by defining to_preddty_df(pipe,text,output_level='my_level' for levels token,sentence, document,chunk and relation .

nlu.to_nlu_pipe(pipe)

Convert a pipeline or list of annotators into a NLU pipeline making .predict() and .viz() avaiable for every Spark NLP pipeline. Assumes the pipeline is already runnable.

# works with Pipeline, LightPipeline, PipelineModel,PretrainedPipeline List[Annotator]
ade_pipeline = PretrainedPipeline('explain_clinical_doc_ade', 'en', 'clinical/models')

text = """I have an allergic reaction to vancomycin.
My skin has be itchy, sore throat/burning/itchy, and numbness in tongue and gums.
I would not recommend this drug to anyone, especially since I have never had such an adverse reaction to any other medication."""

nlu_pipe = nlu.to_nlu_pipe(ade_pipeline)

# Same output as nlu.to_pretty_df(pipe,text) 
nlu_pipe.predict(text)

# same output as nlu.viz(pipe,text)
nlu_pipe.viz(text)

# Acces auto-completed Spark NLP big data pipeline,
nlu_pipe.vanilla_transformer_pipe.transform(spark_df)

returns :

assertion asserted_entitiy entitiy_class assertion_confidence
present allergic reaction ADE 0.998
present itchy ADE 0.8414
present sore throat/burning/itchy ADE 0.9019
present numbness in tongue and gums ADE 0.9991

and


4 new Demo Notebooks

These notebooks showcase some of latest classifier models for Banking Queries, Intents in Text, Question and new s classification


NLU captures every Annotator of Spark NLP and Spark NLP for healthcare

The entire universe of Annotators in Spark NLP and Spark-NLP for healthcare is now embellished by NLU Components by using generalizable annotation extractors methods and configs internally to support enable the new NLU util methods. The following annotator classes are newly captured:


All NLU 4.0 for Healthcare Models

Some examples:

en.rxnorm.umls.mapping

Code:

nlu.load('en.rxnorm.umls.mapping').predict('1161611 315677')
mapped_entity_umls_code_origin_entity mapped_entity_umls_code
1161611 C3215948
315677 C0984912

en.ner.clinical_trials_abstracts

Code:

nlu.load('en.ner.clinical_trials_abstracts').predict('A one-year, randomised, multicentre trial comparing insulin glargine with NPH insulin in combination with oral agents in patients with type 2 diabetes.')

Results:

entities_clinical_trials_abstracts entities_clinical_trials_abstracts_class entities_clinical_trials_abstracts_confidence
0 randomised CTDesign 0.9996
0 multicentre CTDesign 0.9998
0 insulin glargine Drug 0.99135
0 NPH insulin Drug 0.96875
0 type 2 diabetes DisorderOrSyndrome 0.999933

Code:

nlu.load('en.ner.clinical_trials_abstracts').viz('A one-year, randomised, multicentre trial comparing insulin glargine with NPH insulin in combination with oral agents in patients with type 2 diabetes.')

Results:

en.med_ner.pathogen

Code:

nlu.load('en.med_ner.pathogen').predict('Racecadotril is an antisecretory medication and it has better tolerability than loperamide. Diarrhea is the condition of having loose, liquid or watery bowel movements each day. Signs of dehydration often begin with loss of the normal stretchiness of the skin. While it has been speculated that rabies virus, Lyssavirus and Ephemerovirus could be transmitted through aerosols, studies have concluded that this is only feasible in limited conditions.')

Results:

entities_pathogen entities_pathogen_class entities_pathogen_confidence
0 Racecadotril Medicine 0.9468
0 loperamide Medicine 0.9987
0 Diarrhea MedicalCondition 0.9848
0 dehydration MedicalCondition 0.6307
0 rabies virus Pathogen 0.95685
0 Lyssavirus Pathogen 0.9694
0 Ephemerovirus Pathogen 0.6917

Code:

nlu.load('en.med_ner.pathogen').viz('Racecadotril is an antisecretory medication and it has better tolerability than loperamide. Diarrhea is the condition of having loose, liquid or watery bowel movements each day. Signs of dehydration often begin with loss of the normal stretchiness of the skin. While it has been speculated that rabies virus, Lyssavirus and Ephemerovirus could be transmitted through aerosols, studies have concluded that this is only feasible in limited conditions.')

Results:

es.med_ner.living_species.roberta

Code:

nlu.load('es.med_ner.living_species.roberta').predict('Lactante varón de dos años. Antecedentes familiares sin interés. Antecedentes personales: Embarazo, parto y periodo neonatal normal. En seguimiento por alergia a legumbres, diagnosticado con diez meses por reacción urticarial generalizada con lentejas y garbanzos, con dieta de exclusión a legumbres desde entonces. En ésta visita la madre describe episodios de eritema en zona maxilar derecha con afectación ocular ipsilateral que se resuelve en horas tras la administración de corticoides. Le ha ocurrido en 5-6 ocasiones, en relación con la ingesta de alimentos previamente tolerados. Exploración complementaria: Cacahuete, ac(ige)19.2 Ku.arb/l. Resultados: Ante la sospecha clínica de Síndrome de Frey, se tranquiliza a los padres, explicándoles la naturaleza del cuadro y se cita para revisión anual.')

Results:

entities_living_species entities_living_species_class entities_living_species_confidence
0 Lactante varón HUMAN 0.93175
0 familiares HUMAN 1
0 personales HUMAN 1
0 neonatal HUMAN 0.9997
0 legumbres SPECIES 0.9962
0 lentejas SPECIES 0.9988
0 garbanzos SPECIES 0.9901
0 legumbres SPECIES 0.9976
0 madre HUMAN 1
0 Cacahuete SPECIES 0.998
0 padres HUMAN 1

Code:

nlu.load('es.med_ner.living_species.roberta').viz('Lactante varón de dos años. Antecedentes familiares sin interés. Antecedentes personales: Embarazo, parto y periodo neonatal normal. En seguimiento por alergia a legumbres, diagnosticado con diez meses por reacción urticarial generalizada con lentejas y garbanzos, con dieta de exclusión a legumbres desde entonces. En ésta visita la madre describe episodios de eritema en zona maxilar derecha con afectación ocular ipsilateral que se resuelve en horas tras la administración de corticoides. Le ha ocurrido en 5-6 ocasiones, en relación con la ingesta de alimentos previamente tolerados. Exploración complementaria: Cacahuete, ac(ige)19.2 Ku.arb/l. Resultados: Ante la sospecha clínica de Síndrome de Frey, se tranquiliza a los padres, explicándoles la naturaleza del cuadro y se cita para revisión anual.')

Results:

All healthcare models added in NLU 4.0 :

Language NLU Reference Spark NLP Reference Task Annotator Class model_id
en en.map_entity.abbreviation_to_definition abbreviation_mapper Chunk Mapping ChunkMapperModel Chunk Mappingen.map_entity.abbreviation_to_definition
en en.map_entity.abbreviation_to_definition abbreviation_mapper Chunk Mapping ChunkMapperModel Chunk Mappingen.map_entity.abbreviation_to_definition
en en.map_entity.drug_to_action_treatment drug_action_treatment_mapper Chunk Mapping ChunkMapperModel Chunk Mappingen.map_entity.drug_to_action_treatment
en en.map_entity.drug_to_action_treatment drug_action_treatment_mapper Chunk Mapping ChunkMapperModel Chunk Mappingen.map_entity.drug_to_action_treatment
en en.map_entity.drug_to_action_treatment drug_action_treatment_mapper Chunk Mapping ChunkMapperModel Chunk Mappingen.map_entity.drug_to_action_treatment
en en.map_entity.drug_brand_to_ndc drug_brandname_ndc_mapper Chunk Mapping ChunkMapperModel Chunk Mappingen.map_entity.drug_brand_to_ndc
en en.map_entity.drug_brand_to_ndc drug_brandname_ndc_mapper Chunk Mapping ChunkMapperModel Chunk Mappingen.map_entity.drug_brand_to_ndc
en en.map_entity.icd10cm_to_snomed icd10cm_snomed_mapper Chunk Mapping ChunkMapperModel Chunk Mappingen.map_entity.icd10cm_to_snomed
en en.map_entity.icd10cm_to_umls icd10cm_umls_mapper Chunk Mapping ChunkMapperModel Chunk Mappingen.map_entity.icd10cm_to_umls
en en.map_entity.icdo_to_snomed icdo_snomed_mapper Chunk Mapping ChunkMapperModel Chunk Mappingen.map_entity.icdo_to_snomed
en en.map_entity.mesh_to_umls mesh_umls_mapper Chunk Mapping ChunkMapperModel Chunk Mappingen.map_entity.mesh_to_umls
en en.map_entity.rxnorm_to_action_treatment rxnorm_action_treatment_mapper Chunk Mapping ChunkMapperModel Chunk Mappingen.map_entity.rxnorm_to_action_treatment
en en.map_entity.rxnorm_to_action_treatment rxnorm_action_treatment_mapper Chunk Mapping ChunkMapperModel Chunk Mappingen.map_entity.rxnorm_to_action_treatment
en en.map_entity.rxnorm_resolver rxnorm_mapper Chunk Mapping ChunkMapperModel Chunk Mappingen.map_entity.rxnorm_resolver
en en.map_entity.rxnorm_resolver rxnorm_mapper Chunk Mapping ChunkMapperModel Chunk Mappingen.map_entity.rxnorm_resolver
en en.map_entity.rxnorm_to_ndc rxnorm_ndc_mapper Chunk Mapping ChunkMapperModel Chunk Mappingen.map_entity.rxnorm_to_ndc
en en.map_entity.rxnorm_to_ndc rxnorm_ndc_mapper Chunk Mapping ChunkMapperModel Chunk Mappingen.map_entity.rxnorm_to_ndc
en en.map_entity.rxnorm_to_ndc rxnorm_ndc_mapper Chunk Mapping ChunkMapperModel Chunk Mappingen.map_entity.rxnorm_to_ndc
en en.map_entity.rxnorm_to_umls rxnorm_umls_mapper Chunk Mapping ChunkMapperModel Chunk Mappingen.map_entity.rxnorm_to_umls
en en.map_entity.rxnorm_to_umls rxnorm_umls_mapper Chunk Mapping ChunkMapperModel Chunk Mappingen.map_entity.rxnorm_to_umls
en en.map_entity.snomed_to_icd10cm snomed_icd10cm_mapper Chunk Mapping ChunkMapperModel Chunk Mappingen.map_entity.snomed_to_icd10cm
en en.map_entity.snomed_to_icdo snomed_icdo_mapper Chunk Mapping ChunkMapperModel Chunk Mappingen.map_entity.snomed_to_icdo
en en.map_entity.snomed_to_umls snomed_umls_mapper Chunk Mapping ChunkMapperModel Chunk Mappingen.map_entity.snomed_to_umls
en en.map_entity.snomed_to_icd10cm snomed_icd10cm_mapper Chunk Mapping ChunkMapperModel Chunk Mappingen.map_entity.snomed_to_icd10cm
en en.map_entity.icd10cm_to_snomed icd10cm_snomed_mapper Chunk Mapping ChunkMapperModel Chunk Mappingen.map_entity.icd10cm_to_snomed
en en.map_entity.snomed_to_icdo snomed_icdo_mapper Chunk Mapping ChunkMapperModel Chunk Mappingen.map_entity.snomed_to_icdo
en en.map_entity.icdo_to_snomed icdo_snomed_mapper Chunk Mapping ChunkMapperModel Chunk Mappingen.map_entity.icdo_to_snomed
en en.map_entity.rxnorm_to_umls rxnorm_umls_mapper Chunk Mapping ChunkMapperModel Chunk Mappingen.map_entity.rxnorm_to_umls
en en.map_entity.rxnorm_to_umls rxnorm_umls_mapper Chunk Mapping ChunkMapperModel Chunk Mappingen.map_entity.rxnorm_to_umls
en en.map_entity.icd10cm_to_umls icd10cm_umls_mapper Chunk Mapping ChunkMapperModel Chunk Mappingen.map_entity.icd10cm_to_umls
en en.map_entity.mesh_to_umls mesh_umls_mapper Chunk Mapping ChunkMapperModel Chunk Mappingen.map_entity.mesh_to_umls
en en.map_entity.snomed_to_umls snomed_umls_mapper Chunk Mapping ChunkMapperModel Chunk Mappingen.map_entity.snomed_to_umls
en en.map_entity.section_headers_normalized normalized_section_header_mapper Chunk Mapping PretrainedPipeline Chunk Mappingen.map_entity.section_headers_normalized
en en.map_entity.section_headers_normalized normalized_section_header_mapper Chunk Mapping PretrainedPipeline Chunk Mappingen.map_entity.section_headers_normalized
en en.map_entity.section_headers_normalized normalized_section_header_mapper Chunk Mapping PretrainedPipeline Chunk Mappingen.map_entity.section_headers_normalized
en en.icd10cm_to_snomed icd10cm_snomed_mapper Chunk Mapping ChunkMapperModel Chunk Mappingen.icd10cm_to_snomed
en en.icd10cm_to_umls icd10cm_umls_mapper Chunk Mapping ChunkMapperModel Chunk Mappingen.icd10cm_to_umls
en en.icdo_to_snomed icdo_snomed_mapper Chunk Mapping ChunkMapperModel Chunk Mappingen.icdo_to_snomed
en en.mesh_to_umls mesh_umls_mapper Chunk Mapping ChunkMapperModel Chunk Mappingen.mesh_to_umls
en en.rxnorm_to_umls rxnorm_umls_mapper Chunk Mapping ChunkMapperModel Chunk Mappingen.rxnorm_to_umls
en en.rxnorm_to_umls rxnorm_umls_mapper Chunk Mapping ChunkMapperModel Chunk Mappingen.rxnorm_to_umls
en en.snomed_to_icd10cm snomed_icd10cm_mapper Chunk Mapping ChunkMapperModel Chunk Mappingen.snomed_to_icd10cm
en en.snomed_to_icdo snomed_icdo_mapper Chunk Mapping ChunkMapperModel Chunk Mappingen.snomed_to_icdo
en en.snomed_to_umls snomed_umls_mapper Chunk Mapping ChunkMapperModel Chunk Mappingen.snomed_to_umls
en en.map_entity.icd10cm_to_snomed.pipe icd10cm_snomed_mapping Pipeline Healthcare PretrainedPipeline Pipeline Healthcareen.map_entity.icd10cm_to_snomed.pipe
en en.map_entity.snomed_to_icd10cm.pipe snomed_icd10cm_mapping Pipeline Healthcare PretrainedPipeline Pipeline Healthcareen.map_entity.snomed_to_icd10cm.pipe
en en.map_entity.snomed_to_icd10cm.pipe snomed_icd10cm_mapping Pipeline Healthcare PretrainedPipeline Pipeline Healthcareen.map_entity.snomed_to_icd10cm.pipe
en en.map_entity.icdo_to_snomed.pipe icdo_snomed_mapping Pipeline Healthcare PretrainedPipeline Pipeline Healthcareen.map_entity.icdo_to_snomed.pipe
en en.map_entity.snomed_to_icdo.pipe snomed_icdo_mapping Pipeline Healthcare PretrainedPipeline Pipeline Healthcareen.map_entity.snomed_to_icdo.pipe
en en.map_entity.rxnorm_to_ndc.pipe rxnorm_ndc_mapping Pipeline Healthcare PretrainedPipeline Pipeline Healthcareen.map_entity.rxnorm_to_ndc.pipe
en en.med_ner.pathogen.pipeline ner_pathogen_pipeline Pipeline Healthcare PretrainedPipeline Pipeline Healthcareen.med_ner.pathogen.pipeline
en en.med_ner.biomedical_bc2gm.pipeline ner_biomedical_bc2gm_pipeline Pipeline Healthcare PretrainedPipeline Pipeline Healthcareen.med_ner.biomedical_bc2gm.pipeline
ro ro.deid.clinical clinical_deidentification Pipeline Healthcare MedicalNerModel Pipeline Healthcarero.deid.clinical
en en.med_ner.clinical_trials_abstracts.pipe ner_clinical_trials_abstracts_pipeline Pipeline Healthcare PretrainedPipeline Pipeline Healthcareen.med_ner.clinical_trials_abstracts.pipe
en en.ner.clinical_trials_abstracts ner_clinical_trials_abstracts Named Entity Recognition MedicalNerModel Named Entity Recognitionen.ner.clinical_trials_abstracts
en en.med_ner.clinical_trials_abstracts bert_token_classifier_ner_clinical_trials_abstracts Named Entity Recognition MedicalBertForTokenClassifier Named Entity Recognitionen.med_ner.clinical_trials_abstracts
en en.med_ner.pathogen ner_pathogen Named Entity Recognition MedicalNerModel Named Entity Recognitionen.med_ner.pathogen
en en.med_ner.living_species.token_bert bert_token_classifier_ner_living_species Named Entity Recognition MedicalBertForTokenClassifier Named Entity Recognitionen.med_ner.living_species.token_bert
en en.med_ner.living_species ner_living_species Named Entity Recognition MedicalNerModel Named Entity Recognitionen.med_ner.living_species
en en.med_ner.living_species.biobert ner_living_species_biobert Named Entity Recognition MedicalNerModel Named Entity Recognitionen.med_ner.living_species.biobert
en en.classify.stress bert_sequence_classifier_stress Text Classification MedicalBertForSequenceClassification Text Classificationen.classify.stress
es es.embed.scielo300d embeddings_scielo_300d Embeddings WordEmbeddingsModel Embeddingses.embed.scielo300d
es es.med_ner.living_species ner_living_species Named Entity Recognition MedicalNerModel Named Entity Recognitiones.med_ner.living_species
es es.med_ner.living_species.bert ner_living_species_bert Named Entity Recognition MedicalNerModel Named Entity Recognitiones.med_ner.living_species.bert
es es.med_ner.living_species.roberta ner_living_species_roberta Named Entity Recognition MedicalNerModel Named Entity Recognitiones.med_ner.living_species.roberta
es es.med_ner.living_species.300 ner_living_species_300 Named Entity Recognition MedicalNerModel Named Entity Recognitiones.med_ner.living_species.300
es es.med_ner.living_species ner_living_species Named Entity Recognition MedicalNerModel Named Entity Recognitiones.med_ner.living_species
fr fr.med_ner.living_species ner_living_species Named Entity Recognition MedicalNerModel Named Entity Recognitionfr.med_ner.living_species
fr fr.med_ner.living_species.bert ner_living_species_bert Named Entity Recognition MedicalNerModel Named Entity Recognitionfr.med_ner.living_species.bert
pt pt.med_ner.living_species.token_bert bert_token_classifier_ner_living_species Named Entity Recognition MedicalBertForTokenClassifier Named Entity Recognitionpt.med_ner.living_species.token_bert
pt pt.med_ner.living_species ner_living_species Named Entity Recognition MedicalNerModel Named Entity Recognitionpt.med_ner.living_species
pt pt.med_ner.living_species.roberta ner_living_species_roberta Named Entity Recognition MedicalNerModel Named Entity Recognitionpt.med_ner.living_species.roberta
pt pt.med_ner.living_species.bert ner_living_species_bert Named Entity Recognition MedicalNerModel Named Entity Recognitionpt.med_ner.living_species.bert
it it.med_ner.living_species ner_living_species Named Entity Recognition MedicalNerModel Named Entity Recognitionit.med_ner.living_species
it it.med_ner.living_species.bert ner_living_species_bert Named Entity Recognition MedicalNerModel Named Entity Recognitionit.med_ner.living_species.bert
it it.med_ner.living_species ner_living_species Named Entity Recognition MedicalNerModel Named Entity Recognitionit.med_ner.living_species
ca ca.med_ner.living_species ner_living_species Named Entity Recognition MedicalNerModel Named Entity Recognitionca.med_ner.living_species
gl gl.med_ner.living_species ner_living_species Named Entity Recognition MedicalNerModel Named Entity Recognitiongl.med_ner.living_species
ro ro.med_ner.living_species.bert ner_living_species_bert Named Entity Recognition MedicalNerModel Named Entity Recognitionro.med_ner.living_species.bert
ro ro.med_ner.clinical ner_clinical Named Entity Recognition MedicalNerModel Named Entity Recognitionro.med_ner.clinical
ro ro.embed.clinical.bert.base_cased ner_clinical_bert Named Entity Recognition MedicalNerModel Named Entity Recognitionro.embed.clinical.bert.base_cased
ro ro.med_ner.deid.subentity ner_deid_subentity Named Entity Recognition MedicalNerModel Named Entity Recognitionro.med_ner.deid.subentity
ro ro.med_ner.deid.subentity.bert ner_deid_subentity_bert Named Entity Recognition MedicalNerModel Named Entity Recognitionro.med_ner.deid.subentity.bert

All NLU 4.0 Core Models

All core models added in NLU 4.0 : Can be found on the NLU website because of Github Limitations

NLU Reference Spark NLP Reference Task Language Name(s) Annotator Class
bn.answer_question.tydiqa.multi_lingual_bert bert_qa_mbert_bengali_tydiqa_qa Question Answering Bengali BertForQuestionAnswering
es.answer_question.squadv2.electra.small electra_qa_biomedtra_small_es_squad2 Question Answering Castilian, Spanish BertForQuestionAnswering
es.answer_question.squad_sqac.bert.base_cased bert_qa_bert_base_spanish_wwm_cased_finetuned_sqac_finetuned_squad Question Answering Castilian, Spanish BertForQuestionAnswering
es.answer_question.squadv2.bert.base_cased.by_MMG bert_qa_bert_base_spanish_wwm_cased_finetuned_squad2_es_MMG Question Answering Castilian, Spanish BertForQuestionAnswering
es.answer_question.squadv2.bert.base_cased.by_mrm8488 bert_qa_bert_base_spanish_wwm_cased_finetuned_spa_squad2_es_mrm8488 Question Answering Castilian, Spanish BertForQuestionAnswering
es.answer_question.squadv2.bert.distilled_base_cased bert_qa_distill_bert_base_spanish_wwm_cased_finetuned_spa_squad2_es_mrm8488 Question Answering Castilian, Spanish BertForQuestionAnswering
es.answer_question.squad.ruperta.base.by_mrm8488 roberta_qa_RuPERTa_base_finetuned_squadv1 Question Answering Castilian, Spanish RoBertaForQuestionAnswering
es.answer_question.squadv2.roberta.base roberta_qa_roberta_base_bne_squad2_hackathon_pln Question Answering Castilian, Spanish RoBertaForQuestionAnswering
es.answer_question.squadv2_sqac.bert.base_cased_spa.by_MMG bert_qa_bert_base_spanish_wwm_cased_finetuned_spa_squad2_es_finetuned_sqac Question Answering Castilian, Spanish BertForQuestionAnswering
es.answer_question.squadv2_bio_medical.roberta.base roberta_qa_roberta_base_biomedical_es_squad2_hackathon_pln Question Answering Castilian, Spanish RoBertaForQuestionAnswering
es.answer_question.squadv2_clinical_bio_medical.roberta.base roberta_qa_roberta_base_biomedical_clinical_es_squad2_hackathon_pln Question Answering Castilian, Spanish RoBertaForQuestionAnswering
es.answer_question.squadv2_sqac.bert.base_cased.by_MMG bert_qa_bert_base_spanish_wwm_cased_finetuned_sqac_finetuned_squad2_es_MMG Question Answering Castilian, Spanish BertForQuestionAnswering
es.answer_question.squadv2_sqac.bert.base_cased_v2.by_MMG bert_qa_bert_base_spanish_wwm_cased_finetuned_squad2_es_finetuned_sqac Question Answering Castilian, Spanish BertForQuestionAnswering
es.answer_question.xlm_roberta.base xlm_roberta_qa_xlm_roberta_base_spanish Question Answering Castilian, Spanish XlmRoBertaForQuestionAnswering
es.answer_question.xlm_roberta.multilingual_large xlm_roberta_qa_xlm_roberta_large_qa_multilingual_finedtuned_ru_ru_AlexKay Question Answering Castilian, Spanish XlmRoBertaForQuestionAnswering
es.answer_question.squad.roberta.large.by_stevemobs roberta_qa_roberta_large_fine_tuned_squad_es_stevemobs Question Answering Castilian, Spanish RoBertaForQuestionAnswering
es.answer_question.squadv2.roberta.base_v2 roberta_qa_RuPERTa_base_finetuned_squadv2 Question Answering Castilian, Spanish RoBertaForQuestionAnswering
es.answer_question.squad.roberta.large.by_jamarju roberta_qa_roberta_large_bne_squad_2.0_es_jamarju Question Answering Castilian, Spanish RoBertaForQuestionAnswering
es.answer_question.sqac.roberta.large.by_BSC-TeMU roberta_qa_BSC_TeMU_roberta_large_bne_sqac Question Answering Castilian, Spanish RoBertaForQuestionAnswering
es.answer_question.squad.roberta.base.by_jamarju roberta_qa_roberta_base_bne_squad_2.0_es_jamarju Question Answering Castilian, Spanish RoBertaForQuestionAnswering
es.answer_question.squad.roberta.base_4096.by_mrm8488 roberta_qa_longformer_base_4096_spanish_finetuned_squad Question Answering Castilian, Spanish RoBertaForQuestionAnswering
es.answer_question.distil_bert.base_uncased distilbert_qa_distillbert_base_spanish_uncased_finetuned_qa_tar Question Answering Castilian, Spanish DistilBertForQuestionAnswering
es.answer_question.mlqa.distil_bert.base_uncased distilbert_qa_distillbert_base_spanish_uncased_finetuned_qa_mlqa Question Answering Castilian, Spanish DistilBertForQuestionAnswering
es.answer_question.sqac.bert.base bert_qa_beto_base_spanish_sqac Question Answering Castilian, Spanish BertForQuestionAnswering
es.answer_question.sqac.distil_bert.base_uncased distilbert_qa_distillbert_base_spanish_uncased_finetuned_qa_sqac Question Answering Castilian, Spanish DistilBertForQuestionAnswering
es.answer_question.sqac.roberta.base.by_BSC-TeMU roberta_qa_BSC_TeMU_roberta_base_bne_sqac Question Answering Castilian, Spanish RoBertaForQuestionAnswering
es.answer_question.sqac.roberta.base.by_IIC roberta_qa_roberta_base_spanish_sqac Question Answering Castilian, Spanish RoBertaForQuestionAnswering
es.answer_question.sqac.bert.base_cased bert_qa_bert_base_spanish_wwm_cased_finetuned_sqac Question Answering Castilian, Spanish BertForQuestionAnswering
es.answer_question.sqac.roberta.base.by_mrm8488 roberta_qa_mrm8488_roberta_base_bne_finetuned_sqac Question Answering Castilian, Spanish RoBertaForQuestionAnswering
es.answer_question.sqac.roberta.base.by_nlp-en-es roberta_qa_nlp_en_es_roberta_base_bne_finetuned_sqac Question Answering Castilian, Spanish RoBertaForQuestionAnswering
es.answer_question.sqac.roberta.large.by_PlanTL-GOB-ES roberta_qa_PlanTL_GOB_ES_roberta_large_bne_sqac Question Answering Castilian, Spanish RoBertaForQuestionAnswering
es.answer_question.sqac.roberta.large.by_nlp-en-es roberta_qa_bertin_large_finetuned_sqac Question Answering Castilian, Spanish RoBertaForQuestionAnswering
es.answer_question.squad.electra.small electra_qa_electricidad_small_finetuned_squadv1 Question Answering Castilian, Spanish BertForQuestionAnswering
es.answer_question.squad.roberta.base.by_IIC roberta_qa_roberta_base_spanish_squades Question Answering Castilian, Spanish RoBertaForQuestionAnswering
es.answer_question.sqac.roberta.base.by_PlanTL-GOB-ES roberta_qa_PlanTL_GOB_ES_roberta_base_bne_sqac Question Answering Castilian, Spanish RoBertaForQuestionAnswering
ch.answer_question.xlm_roberta xlm_roberta_qa_ADDI_CH_XLM_R Question Answering Chamorro XlmRoBertaForQuestionAnswering
da.answer_question.squad.bert bert_qa_danish_bert_botxo_qa_squad Question Answering Danish BertForQuestionAnswering
da.answer_question.squad.xlmr_roberta.base xlm_roberta_qa_xlmr_base_texas_squad_da_da_saattrupdan Question Answering Danish XlmRoBertaForQuestionAnswering
nl.answer_question.squadv2.bert.multilingual_base_cased bert_qa_bert_base_multilingual_cased_finetuned_dutch_squad2 Question Answering Dutch, Flemish BertForQuestionAnswering
en.answer_question.squad.roberta.large.by_csarron roberta_qa_roberta_large_squad_v1 Question Answering English RoBertaForQuestionAnswering
en.answer_question.squad.roberta.large.by_rahulchakwate roberta_qa_roberta_large_finetuned_squad Question Answering English RoBertaForQuestionAnswering
en.answer_question.squad.scibert.by_amoux bert_qa_scibert_nli_squad Question Answering English BertForQuestionAnswering
en.answer_question.squad.scibert.by_ixa-ehu bert_qa_SciBERT_SQuAD_QuAC Question Answering English BertForQuestionAnswering
en.answer_question.squad.scibert.uncased bert_qa_scibert_scivocab_uncased_squad Question Answering English BertForQuestionAnswering
en.answer_question.squad.span_bert bert_qa_spanbert_finetuned_squadv1 Question Answering English BertForQuestionAnswering
en.answer_question.squad.span_bert.base_cased_1024d_seed_0 bert_qa_spanbert_base_cased_few_shot_k_1024_finetuned_squad_seed_0 Question Answering English BertForQuestionAnswering
en.answer_question.squad.span_bert.base_cased_1024d_seed_10 bert_qa_spanbert_base_cased_few_shot_k_1024_finetuned_squad_seed_10 Question Answering English BertForQuestionAnswering
en.answer_question.squad.span_bert.base_cased_1024d_seed_2 bert_qa_spanbert_base_cased_few_shot_k_1024_finetuned_squad_seed_2 Question Answering English BertForQuestionAnswering
en.answer_question.squad.span_bert.base_cased_1024d_seed_4 bert_qa_spanbert_base_cased_few_shot_k_1024_finetuned_squad_seed_4 Question Answering English BertForQuestionAnswering
en.answer_question.squad.span_bert.base_cased_1024d_seed_8 bert_qa_spanbert_base_cased_few_shot_k_1024_finetuned_squad_seed_8 Question Answering English BertForQuestionAnswering
en.answer_question.squad.span_bert.base_cased_1024d_seed_6 bert_qa_spanbert_base_cased_few_shot_k_1024_finetuned_squad_seed_6 Question Answering English BertForQuestionAnswering
en.answer_question.squad.span_bert.base_cased_128d_seed_10 bert_qa_spanbert_base_cased_few_shot_k_128_finetuned_squad_seed_10 Question Answering English BertForQuestionAnswering
en.answer_question.squad.span_bert.base_cased_128d_seed_4 bert_qa_spanbert_base_cased_few_shot_k_128_finetuned_squad_seed_4 Question Answering English BertForQuestionAnswering
en.answer_question.squad.span_bert.base_cased_128d_seed_6 bert_qa_spanbert_base_cased_few_shot_k_128_finetuned_squad_seed_6 Question Answering English BertForQuestionAnswering
en.answer_question.squad.span_bert.base_cased_128d_seed_8 bert_qa_spanbert_base_cased_few_shot_k_128_finetuned_squad_seed_8 Question Answering English BertForQuestionAnswering
en.answer_question.squad.span_bert.base_cased_256d_seed_10 bert_qa_spanbert_base_cased_few_shot_k_256_finetuned_squad_seed_10 Question Answering English BertForQuestionAnswering
en.answer_question.squad.span_bert.base_cased_32d_seed_0 [bert_qa_spanbert_base_cased_few_shot_k_32_finetuned_squad_seed_0](https://nlp.johnsnowlabs.com/2022/06/02/bert_qa_spanbert_base_cased_few_shot_k_32_finetuned_squad_seed_0_en_3_0.ht