John Snow Labs NLU 1.1.1 : New multilingual models, Spark 2.3 support, new tutorials and more!

NLU 1.1.1 Release Notes

We are very excited to release NLU 1.1.1! This release features 3 new tutorial notebooks for Open/Closed book question answering with Google's T5, Intent classification, and Aspect Based NER. In Addition, NLU 1.1.0 comes with 25+ pre-trained models and pipelines in Amharic, Bengali, Bhojpuri, Japanese, and Korean languages from the amazing Spark2.7.2 release Finally, NLU now supports running on Spark 2.3 clusters.

NLU 1.1.0 New Non-English Models

Language	nlu.load() reference	Spark NLP Model reference	Type
Arabic	ar.ner	arabic_w2v_cc_300d	Named Entity Recognizer
Arabic	ar.embed.aner	aner_cc_300d	Word Embedding
Arabic	ar.embed.aner.300d	aner_cc_300d	Word Embedding (Alias)
Bengali	bn.stopwords	stopwords_bn	Stopwords Cleaner
Bengali	bn.pos	pos_msri	Part of Speech
Thai	th.segment_words	wordseg_best	Word Segmenter
Thai	th.pos	pos_lst20	Part of Speech
Thai	th.sentiment	sentiment_jager_use	Sentiment Classifier
Thai	th.classify.sentiment	sentiment_jager_use	Sentiment Classifier (Alias)
Chinese	zh.pos.ud_gsd_trad	pos_ud_gsd_trad	Part of Speech
Chinese	zh.segment_words.gsd	wordseg_gsd_ud_trad	Word Segmenter
Bihari	bh.pos	pos_ud_bhtb	Part of Speech
Amharic	am.pos	pos_ud_att	Part of Speech

NLU 1.1.1 New English Models and Pipelines

Language	nlu.load() reference	Spark NLP Model reference	Type
English	en.sentiment.glove	analyze_sentimentdl_glove_imdb	Sentiment Classifier
English	en.sentiment.glove.imdb	analyze_sentimentdl_glove_imdb	Sentiment Classifier (Alias)
English	en.classify.sentiment.glove.imdb	analyze_sentimentdl_glove_imdb	Sentiment Classifier (Alias)
English	en.classify.sentiment.glove	analyze_sentimentdl_glove_imdb	Sentiment Classifier (Alias)
English	en.classify.trec50.pipe	classifierdl_use_trec50_pipeline	Language Classifier
English	en.ner.onto.large	onto_recognize_entities_electra_large	Named Entity Recognizer
English	en.classify.questions.atis	classifierdl_use_atis	Intent Classifier
English	en.classify.questions.airline	classifierdl_use_atis	Intent Classifier (Alias)
English	en.classify.intent.atis	classifierdl_use_atis	Intent Classifier (Alias)
English	en.classify.intent.airline	classifierdl_use_atis	Intent Classifier (Alias)
English	en.ner.atis	nerdl_atis_840b_300d	Aspect based NER
English	en.ner.airline	nerdl_atis_840b_300d	Aspect based NER (Alias)
English	en.ner.aspect.airline	nerdl_atis_840b_300d	Aspect based NER (Alias)
English	en.ner.aspect.atis	nerdl_atis_840b_300d	Aspect based NER (Alias)

New Easy NLU 1-liner Examples :

Extract aspects and entities from airline questions (ATIS dataset)


nlu.load("en.ner.atis").predict("i want to fly from baltimore to dallas round trip")
output:  ["baltimore"," dallas", "round trip"]

Intent Classification for Airline Traffic Information System queries (ATIS dataset)


nlu.load("en.classify.questions.atis").predict("what is the price of flight from newyork to washington")
output:  "atis_airfare"

Recognize Entities OntoNotes - ELECTRA Large


nlu.load("en.ner.onto.large").predict("Johnson first entered politics when elected in 2001 as a member of Parliament. He then served eight years as the mayor of London.")  
output:  ["Johnson", "first", "2001", "eight years", "London"]

Question classification of open-domain and fact-based questions Pipeline - TREC50

nlu.load("en.classify.trec50.pipe").predict("When did the construction of stone circles begin in the UK? ")
output:  LOC_other

Traditional Chinese Word Segmentation

# 'However, this treatment also creates some problems' in Chinese
nlu.load("zh.segment_words.gsd").predict("然而，這樣的處理也衍生了一些問題。")
output:  ["然而",",","這樣","的","處理","也","衍生","了","一些","問題","。"]

Part of Speech for Traditional Chinese

# 'However, this treatment also creates some problems' in Chinese
nlu.load("zh.pos.ud_gsd_trad").predict("然而，這樣的處理也衍生了一些問題。")

Output:

Token	POS
然而	ADV
，	PUNCT
這樣	PRON
的	PART
處理	NOUN
也	ADV
衍生	VERB
了	PART
一些	ADJ
問題	NOUN
。	PUNCT

Thai Word Segment Recognition

# 'Mona Lisa is a 16th-century oil painting created by Leonardo held at the Louvre in Paris' in Thai
nlu.loadnlu.load("th.segment_words").predict("Mona Lisa เป็นภาพวาดสีน้ำมันในศตวรรษที่ 16 ที่สร้างโดย Leonardo จัดขึ้นที่พิพิธภัณฑ์ลูฟร์ในปารีส")

Output:

token
M
o
n
a
Lisa
เป็น
ภาพ
ว
า
ด
สีน้ำ
มัน
ใน
ศตวรรษ
ที่
16
ที่
สร้าง
โ
ด
ย
L
e
o
n
a
r
d
o
จัด
ขึ้น
ที่
พิพิธภัณฑ์
ลูฟร์
ใน
ปารีส

Part of Speech for Bengali (POS)

# 'The village is also called 'Mod' in Tora language' in Bengali 
nlu.load("bn.pos").predict("বাসস্থান-ঘরগৃহস্থালি তোড়া ভাষায় গ্রামকেও বলে ` মোদ ' ৷")

Output:

token	pos
বাসস্থান-ঘরগৃহস্থালি	NN
তোড়া	NNP
ভাষায়	NN
গ্রামকেও	NN
বলে	VM
`	SYM
মোদ	NN
'	SYM
৷	SYM

Stop Words Cleaner for Bengali

# 'This language is not enough' in Bengali 
df = nlu.load("bn.stopwords").predict("এই ভাষা যথেষ্ট নয়")

Output:

cleanTokens	token
ভাষা	এই
যথেষ্ট	ভাষা
নয়	যথেষ্ট
None	নয়

Part of Speech for Bengali


# 'The people of Ohu know that the foundation of Bhojpuri was shaken' in Bengali
nlu.load('bh.pos').predict("ओहु लोग के मालूम बा कि श्लील होखते भोजपुरी के नींव हिल जाई").to_markdown()

Output:

pos	token
DET	ओहु
NOUN	लोग
ADP	के
NOUN	मालूम
VERB	बा
SCONJ	कि
ADJ	श्लील
VERB	होखते
PROPN	भोजपुरी
ADP	के
NOUN	नींव
VERB	हिल
AUX	जाई

Amharic Part of Speech (POS)

# ' "Son, finish the job," he said.' in Amharic
nlu.load('am.pos').predict('ልጅ ኡ ን ሥራ ው ን አስጨርስ ኧው ኣል ኧሁ ።"').to_markdown()

Output:

pos	token
NOUN	ልጅ
DET	ኡ
PART	ን
NOUN	ሥራ
DET	ው
PART	ን
VERB	አስጨርስ
PRON	ኧው
AUX	ኣል
PRON	ኧሁ
PUNCT	።
NOUN	"

Thai Sentiment Classification

#  'I love peanut butter and jelly!' in thai
nlu.load('th.classify.sentiment').predict('ฉันชอบเนยถั่วและเยลลี่!')[['sentiment','sentiment_confidence']].to_markdown()

Output:

sentiment	sentiment_confidence
positive	0.999998

Arabic Named Entity Recognition (NER)

# 'In 1918, the forces of the Arab Revolt liberated Damascus with the help of the British' in Arabic
nlu.load('ar.ner').predict('في عام 1918 حررت قوات الثورة العربية دمشق بمساعدة من الإنكليز',output_level='chunk')[['entities_confidence','ner_confidence','entities']].to_markdown()

Output:

entity_class	ner_confidence	entities
ORG	[1.0, 1.0, 1.0, 0.9997000098228455, 0.9840999841690063, 0.9987999796867371, 0.9990000128746033, 0.9998999834060669, 0.9998999834060669, 0.9993000030517578, 0.9998999834060669]	قوات الثورة العربية
LOC	[1.0, 1.0, 1.0, 0.9997000098228455, 0.9840999841690063, 0.9987999796867371, 0.9990000128746033, 0.9998999834060669, 0.9998999834060669, 0.9993000030517578, 0.9998999834060669]	دمشق
PER	[1.0, 1.0, 1.0, 0.9997000098228455, 0.9840999841690063, 0.9987999796867371, 0.9990000128746033, 0.9998999834060669, 0.9998999834060669, 0.9993000030517578, 0.9998999834060669]	الإنكليز

NLU 1.1.0 Enhancements :

Spark 2.3 compatibility

New NLU Notebooks and Tutorials

Installation

# PyPi
!pip install nlu pyspark==2.4.7
#Conda
# Install NLU from Anaconda/Conda
conda install -c johnsnowlabs nlu

JohnSnowLabs / nlu

1.1.1rc1 #31

John Snow Labs NLU 1.1.1 : New multilingual models, Spark 2.3 support, new tutorials and more!

NLU 1.1.1 Release Notes

NLU 1.1.0 New Non-English Models

NLU 1.1.1 New English Models and Pipelines

New Easy NLU 1-liner Examples :

Extract aspects and entities from airline questions (ATIS dataset)

Intent Classification for Airline Traffic Information System queries (ATIS dataset)

Recognize Entities OntoNotes - ELECTRA Large

Question classification of open-domain and fact-based questions Pipeline - TREC50

Traditional Chinese Word Segmentation

Part of Speech for Traditional Chinese

Thai Word Segment Recognition

Part of Speech for Bengali (POS)

Stop Words Cleaner for Bengali

Part of Speech for Bengali

Amharic Part of Speech (POS)

Thai Sentiment Classification

Arabic Named Entity Recognition (NER)

NLU 1.1.0 Enhancements :

New NLU Notebooks and Tutorials

Installation

Additional NLU ressources