2600+ New Models for 200+ Languages and 10+ Dimension Reduction Algorithms for Streamlit Word-Embedding visualizations in 3-D
We are extremely excited to announce the release of NLU 3.1 !
This is our biggest release so far and it comes with over 2600+ new models in 200+ languages, including DistilBERT, RoBERTa, and XLM-RoBERTa and Huggingface based Embeddings from the incredible Spark-NLP 3.1.0 release,
new Streamlit Visualizations for visualizing Word Embeddings in 3-D, 2-D, and 1-D,
New Healthcare pipelines for healthcare code mappings
and finally confidence extraction for open source NER models.
Additionally, the NLU Namespace has been renamed to the NLU Spellbook, to reflect the magicalness of each 1-liners represented by them!
Streamlit Word Embedding visualization via Manifold and Matrix Decomposition algorithms
functionpipe.viz_streamlit_word_embed_manifold
Visualize Word Embeddings in 1-D, 2-D, or 3-D by Reducing Dimensionality via 11 Supported methods from Manifold Algorithms
and Matrix Decomposition Algorithms.
Additionally, you can color the lower dimensional points with a label that has been previously assigned to the text by specifying a list of nlu references in the additional_classifiers_for_coloring parameter.
Reduces Dimensionality of high dimensional Word Embeddings to 1-D, 2-D, or 3-D and plot the resulting data in an interactive Plotly plot
function parameterspipe.viz_streamlit_word_embed_manifold
Argument
Type
Default
Description
default_texts
List[str]
("Donald Trump likes to party!", "Angela Merkel likes to party!", 'Peter HATES TO PARTTY!!!! :(')
List of strings to apply classifiers, embeddings, and manifolds to.
text
Optional[str]
'Billy likes to swim'
Text to predict classes for.
sub_title
Optional[str]
"Apply any of the 11 Manifold or Matrix Decomposition algorithms to reduce the dimensionality of Word Embeddings to 1-D, 2-D and 3-D "
Sub title of the Streamlit app
default_algos_to_apply
List[str]
["TSNE", "PCA"]
A list Manifold and Matrix Decomposition Algorithms to apply. Can be either 'TSNE','ISOMAP','LLE','Spectral Embedding', 'MDS','PCA','SVD aka LSA','DictionaryLearning','FactorAnalysis','FastICA' or 'KernelPCA',
target_dimensions
List[int]
(1,2,3)
Defines the target dimension embeddings will be reduced to
show_algo_select
bool
True
Show selector for Manifold and Matrix Decomposition Algorithms
show_embed_select
bool
True
Show selector for Embedding Selection
show_color_select
bool
True
Show selector for coloring plots
MAX_DISPLAY_NUM
int
100
Cap maximum number of Tokens displayed
display_embed_information
bool
True
Show additional embedding information like dimension, nlu_reference, spark_nlp_reference, sotrage_reference, modelhub link and more.
set_wide_layout_CSS
bool
True
Whether to inject custom CSS or not.
num_cols
int
2
How many columns should for the layout in streamlit when rendering the similarity matrixes.
key
str
"NLU_streamlit"
Key for the Streamlit elements drawn
additional_classifiers_for_coloring
List[str]
['pos', 'sentiment.imdb']
List of additional NLU references to load for generting hue colors
show_model_select
bool
True
Show a model selection dropdowns that makes any of the 1000+ models avaiable in 1 click
nlu.load(en.resolve.icd10cm.umls): This pretrained pipeline maps ICD10CM codes to UMLS codes without using any text data. You’ll just feed white space-delimited ICD10CM codes and it will return the corresponding UMLS codes as a list. If there is no mapping, the original code is returned with no mapping.
nlu.load(en.resolve.mesh.umls): This pretrained pipeline maps MeSH codes to UMLS codes without using any text data. You’ll just feed white space-delimited MeSH codes and it will return the corresponding UMLS codes as a list. If there is no mapping, the original code is returned with no mapping.
nlu.load(en.resolve.rxnorm.umls): This pretrained pipeline maps RxNorm codes to UMLS codes without using any text data. You’ll just feed white space-delimited RxNorm codes and it will return the corresponding UMLS codes as a list. If there is no mapping, the original code is returned with no mapping.
nlu.load(en.resolve.rxnorm.mesh): This pretrained pipeline maps RxNorm codes to MeSH codes without using any text data. You’ll just feed white space-delimited RxNorm codes and it will return the corresponding MeSH codes as a list. If there is no mapping, the original code is returned with no mapping.
nlu.load(en.resolve.snomed.umls): This pretrained pipeline maps SNOMED codes to UMLS codes without using any text data. You’ll just feed white space-delimited SNOMED codes and it will return the corresponding UMLS codes as a list. If there is no mapping, the original code is returned with no mapping.
{'snomed': ['733187009', '449433008', '51264003'],'umls': ['C4546029', 'C3164619', 'C0271267']}
2600+ New Models for 200+ Languages and 10+ Dimension Reduction Algorithms for Streamlit Word-Embedding visualizations in 3-D
We are extremely excited to announce the release of NLU 3.1 ! This is our biggest release so far and it comes with over
2600+ new models in 200+
languages, includingDistilBERT
,RoBERTa
, andXLM-RoBERTa
and Huggingface based Embeddings from the incredible Spark-NLP 3.1.0 release, newStreamlit Visualizations
for visualizing Word Embeddings in3-D
,2-D
, and1-D
, New Healthcare pipelines forhealthcare code mappings
and finallyconfidence extraction
for open source NER models. Additionally, the NLU Namespace has been renamed to the NLU Spellbook, to reflect the magicalness of each 1-liners represented by them!Streamlit Word Embedding visualization via Manifold and Matrix Decomposition algorithms
function
pipe.viz_streamlit_word_embed_manifold
Visualize Word Embeddings in
1-D
,2-D
, or3-D
byReducing Dimensionality
via 11 Supported methods from Manifold Algorithms and Matrix Decomposition Algorithms. Additionally, you can color the lower dimensional points with a label that has been previously assigned to the text by specifying a list of nlu references in theadditional_classifiers_for_coloring
parameter.1-D
,2-D
, or3-D
and plot the resulting data in an interactivePlotly
plotNUM-DIMENSIONS
NUM-EMBEDDINGS
NUM-DIMENSION-REDUCTION-ALGOS
plotsfunction parameters
pipe.viz_streamlit_word_embed_manifold
default_texts
List[str]
text
Optional[str]
'Billy likes to swim'
sub_title
Optional[str]
Manifold
orMatrix Decomposition
algorithms to reduce the dimensionality ofWord Embeddings
to1-D
,2-D
and3-D
"default_algos_to_apply
List[str]
["TSNE", "PCA"]
'TSNE'
,'ISOMAP'
,'LLE'
,'Spectral Embedding'
,'MDS'
,'PCA'
,'SVD aka LSA'
,'DictionaryLearning'
,'FactorAnalysis'
,'FastICA'
or'KernelPCA'
,target_dimensions
List[int]
(1,2,3)
show_algo_select
bool
True
show_embed_select
bool
True
show_color_select
bool
True
MAX_DISPLAY_NUM
int
100
display_embed_information
bool
True
dimension
,nlu_reference
,spark_nlp_reference
,sotrage_reference
,modelhub link
and more.set_wide_layout_CSS
bool
True
num_cols
int
2
key
str
"NLU_streamlit"
additional_classifiers_for_coloring
List[str]
['pos', 'sentiment.imdb']
show_model_select
bool
True
model_select_position
str
'side'
pipe.predict(positions=true
) for more infoshow_logo
bool
True
display_infos
bool
False
n_jobs
Optional[int]
3
False
Larger Example showcasing more dimension reduction techniques on a larger corpus :
Supported Manifold Algorithms
Supported Matrix Decomposition Algorithms
New Healthcare Pipelines Pipelines
Five new healthcare code mapping pipelines:
nlu.load(en.resolve.icd10cm.umls)
: This pretrained pipeline maps ICD10CM codes to UMLS codes without using any text data. You’ll just feed white space-delimited ICD10CM codes and it will return the corresponding UMLS codes as a list. If there is no mapping, the original code is returned with no mapping.{'icd10cm': ['M89.50', 'R82.2', 'R09.01'],'umls': ['C4721411', 'C0159076', 'C0004044']}
nlu.load(en.resolve.mesh.umls)
: This pretrained pipeline maps MeSH codes to UMLS codes without using any text data. You’ll just feed white space-delimited MeSH codes and it will return the corresponding UMLS codes as a list. If there is no mapping, the original code is returned with no mapping.{'mesh': ['C028491', 'D019326', 'C579867'],'umls': ['C0970275', 'C0886627', 'C3696376']}
nlu.load(en.resolve.rxnorm.umls)
: This pretrained pipeline maps RxNorm codes to UMLS codes without using any text data. You’ll just feed white space-delimited RxNorm codes and it will return the corresponding UMLS codes as a list. If there is no mapping, the original code is returned with no mapping.{'rxnorm': ['1161611', '315677', '343663'],'umls': ['C3215948', 'C0984912', 'C1146501']}
nlu.load(en.resolve.rxnorm.mesh)
: This pretrained pipeline maps RxNorm codes to MeSH codes without using any text data. You’ll just feed white space-delimited RxNorm codes and it will return the corresponding MeSH codes as a list. If there is no mapping, the original code is returned with no mapping.{'rxnorm': ['1191', '6809', '47613'],'mesh': ['D001241', 'D008687', 'D019355']}
nlu.load(en.resolve.snomed.umls)
: This pretrained pipeline maps SNOMED codes to UMLS codes without using any text data. You’ll just feed white space-delimited SNOMED codes and it will return the corresponding UMLS codes as a list. If there is no mapping, the original code is returned with no mapping.{'snomed': ['733187009', '449433008', '51264003'],'umls': ['C4546029', 'C3164619', 'C0271267']}
New Healthcare Pipelines
New Open Source Models and Pipelines
Bugfixes
Fixed bugs that occured when loading a model from disk.
140+ NLU Tutorials
Streamlit visualizations docs
The complete list of all 1100+ models & pipelines in 192+ languages is available on Models Hub.
Spark NLP publications
NLU in Action
NLU documentation
Discussions Engage with other community members, share ideas, and show off how you use Spark NLP and NLU!
1 line Install NLU on Google Colab
!wget https://setup.johnsnowlabs.com/nlu/colab.sh -O - | bash
1 line Install NLU on Kaggle
!wget https://setup.johnsnowlabs.com/nlu/kaggle.sh -O - | bash
Install via PIP
! pip install nlu pyspark==3.0.3