John Snow Labs NLU 1.1.1 : New multilingual models, Spark 2.3 support, new tutorials and more!
NLU 1.1.1 Release Notes
We are very excited to release NLU 1.1.1!
This release features 3 new tutorial notebooks for Open/Closed book question answering with Google's T5, Intent classification, and Aspect Based NER.
In Addition, NLU 1.1.0 comes with 25+ pre-trained models and pipelines in Amharic, Bengali, Bhojpuri, Japanese, and Korean languages from the amazing Spark2.7.2 release
Finally, NLU now supports running on Spark 2.3 clusters.
Extract aspects and entities from airline questions (ATIS dataset)
nlu.load("en.ner.atis").predict("i want to fly from baltimore to dallas round trip")
output: ["baltimore"," dallas", "round trip"]
Intent Classification for Airline Traffic Information System queries (ATIS dataset)
nlu.load("en.classify.questions.atis").predict("what is the price of flight from newyork to washington")
output: "atis_airfare"
Recognize Entities OntoNotes - ELECTRA Large
nlu.load("en.ner.onto.large").predict("Johnson first entered politics when elected in 2001 as a member of Parliament. He then served eight years as the mayor of London.")
output: ["Johnson", "first", "2001", "eight years", "London"]
Question classification of open-domain and fact-based questions Pipeline - TREC50
nlu.load("en.classify.trec50.pipe").predict("When did the construction of stone circles begin in the UK? ")
output: LOC_other
Traditional Chinese Word Segmentation
# 'However, this treatment also creates some problems' in Chinese
nlu.load("zh.segment_words.gsd").predict("然而,這樣的處理也衍生了一些問題。")
output: ["然而",",","這樣","的","處理","也","衍生","了","一些","問題","。"]
Part of Speech for Traditional Chinese
# 'However, this treatment also creates some problems' in Chinese
nlu.load("zh.pos.ud_gsd_trad").predict("然而,這樣的處理也衍生了一些問題。")
Output:
Token
POS
然而
ADV
,
PUNCT
這樣
PRON
的
PART
處理
NOUN
也
ADV
衍生
VERB
了
PART
一些
ADJ
問題
NOUN
。
PUNCT
Thai Word Segment Recognition
# 'Mona Lisa is a 16th-century oil painting created by Leonardo held at the Louvre in Paris' in Thai
nlu.loadnlu.load("th.segment_words").predict("Mona Lisa เป็นภาพวาดสีน้ำมันในศตวรรษที่ 16 ที่สร้างโดย Leonardo จัดขึ้นที่พิพิธภัณฑ์ลูฟร์ในปารีส")
Output:
token
M
o
n
a
Lisa
เป็น
ภาพ
ว
า
ด
สีน้ำ
มัน
ใน
ศตวรรษ
ที่
16
ที่
สร้าง
โ
ด
ย
L
e
o
n
a
r
d
o
จัด
ขึ้น
ที่
พิพิธภัณฑ์
ลูฟร์
ใน
ปารีส
Part of Speech for Bengali (POS)
# 'The village is also called 'Mod' in Tora language' in Bengali
nlu.load("bn.pos").predict("বাসস্থান-ঘরগৃহস্থালি তোড়া ভাষায় গ্রামকেও বলে ` মোদ ' ৷")
Output:
token
pos
বাসস্থান-ঘরগৃহস্থালি
NN
তোড়া
NNP
ভাষায়
NN
গ্রামকেও
NN
বলে
VM
`
SYM
মোদ
NN
'
SYM
৷
SYM
Stop Words Cleaner for Bengali
# 'This language is not enough' in Bengali
df = nlu.load("bn.stopwords").predict("এই ভাষা যথেষ্ট নয়")
Output:
cleanTokens
token
ভাষা
এই
যথেষ্ট
ভাষা
নয়
যথেষ্ট
None
নয়
Part of Speech for Bengali
# 'The people of Ohu know that the foundation of Bhojpuri was shaken' in Bengali
nlu.load('bh.pos').predict("ओहु लोग के मालूम बा कि श्लील होखते भोजपुरी के नींव हिल जाई").to_markdown()
Output:
pos
token
DET
ओहु
NOUN
लोग
ADP
के
NOUN
मालूम
VERB
बा
SCONJ
कि
ADJ
श्लील
VERB
होखते
PROPN
भोजपुरी
ADP
के
NOUN
नींव
VERB
हिल
AUX
जाई
Amharic Part of Speech (POS)
# ' "Son, finish the job," he said.' in Amharic
nlu.load('am.pos').predict('ልጅ ኡ ን ሥራ ው ን አስጨርስ ኧው ኣል ኧሁ ።"').to_markdown()
Output:
pos
token
NOUN
ልጅ
DET
ኡ
PART
ን
NOUN
ሥራ
DET
ው
PART
ን
VERB
አስጨርስ
PRON
ኧው
AUX
ኣል
PRON
ኧሁ
PUNCT
።
NOUN
"
Thai Sentiment Classification
# 'I love peanut butter and jelly!' in thai
nlu.load('th.classify.sentiment').predict('ฉันชอบเนยถั่วและเยลลี่!')[['sentiment','sentiment_confidence']].to_markdown()
Output:
sentiment
sentiment_confidence
positive
0.999998
Arabic Named Entity Recognition (NER)
# 'In 1918, the forces of the Arab Revolt liberated Damascus with the help of the British' in Arabic
nlu.load('ar.ner').predict('في عام 1918 حررت قوات الثورة العربية دمشق بمساعدة من الإنكليز',output_level='chunk')[['entities_confidence','ner_confidence','entities']].to_markdown()
John Snow Labs NLU 1.1.1 : New multilingual models, Spark 2.3 support, new tutorials and more!
NLU 1.1.1 Release Notes
We are very excited to release NLU 1.1.1! This release features 3 new tutorial notebooks for Open/Closed book question answering with Google's T5, Intent classification, and Aspect Based NER. In Addition, NLU 1.1.0 comes with 25+ pre-trained models and pipelines in Amharic, Bengali, Bhojpuri, Japanese, and Korean languages from the amazing Spark2.7.2 release Finally, NLU now supports running on Spark 2.3 clusters.
NLU 1.1.0 New Non-English Models
NLU 1.1.1 New English Models and Pipelines
New Easy NLU 1-liner Examples :
Extract aspects and entities from airline questions (ATIS dataset)
Intent Classification for Airline Traffic Information System queries (ATIS dataset)
Recognize Entities OntoNotes - ELECTRA Large
Question classification of open-domain and fact-based questions Pipeline - TREC50
Traditional Chinese Word Segmentation
Part of Speech for Traditional Chinese
Output:
Thai Word Segment Recognition
Output:
Part of Speech for Bengali (POS)
Output:
Stop Words Cleaner for Bengali
Output:
Part of Speech for Bengali
Output:
Amharic Part of Speech (POS)
Output:
Thai Sentiment Classification
Output:
Arabic Named Entity Recognition (NER)
Output:
NLU 1.1.0 Enhancements :
New NLU Notebooks and Tutorials
Installation
Additional NLU ressources