JohnSnowLabs / nlu

1 line for thousands of State of The Art NLP models in hundreds of languages The fastest and most accurate way to solve text problems.
Apache License 2.0
855 stars 130 forks source link

1.1.1rc1 #31

Closed C-K-Loan closed 3 years ago

C-K-Loan commented 3 years ago

John Snow Labs NLU 1.1.1 : New multilingual models, Spark 2.3 support, new tutorials and more!

NLU 1.1.1 Release Notes

We are very excited to release NLU 1.1.1! This release features 3 new tutorial notebooks for Open/Closed book question answering with Google's T5, Intent classification, and Aspect Based NER. In Addition, NLU 1.1.0 comes with 25+ pre-trained models and pipelines in Amharic, Bengali, Bhojpuri, Japanese, and Korean languages from the amazing Spark2.7.2 release Finally, NLU now supports running on Spark 2.3 clusters.

NLU 1.1.0 New Non-English Models

Language nlu.load() reference Spark NLP Model reference Type
Arabic ar.ner arabic_w2v_cc_300d Named Entity Recognizer
Arabic ar.embed.aner aner_cc_300d Word Embedding
Arabic ar.embed.aner.300d aner_cc_300d Word Embedding (Alias)
Bengali bn.stopwords stopwords_bn Stopwords Cleaner
Bengali bn.pos pos_msri Part of Speech
Thai th.segment_words wordseg_best Word Segmenter
Thai th.pos pos_lst20 Part of Speech
Thai th.sentiment sentiment_jager_use Sentiment Classifier
Thai th.classify.sentiment sentiment_jager_use Sentiment Classifier (Alias)
Chinese zh.pos.ud_gsd_trad pos_ud_gsd_trad Part of Speech
Chinese zh.segment_words.gsd wordseg_gsd_ud_trad Word Segmenter
Bihari bh.pos pos_ud_bhtb Part of Speech
Amharic am.pos pos_ud_att Part of Speech

NLU 1.1.1 New English Models and Pipelines

Language nlu.load() reference Spark NLP Model reference Type
English en.sentiment.glove analyze_sentimentdl_glove_imdb Sentiment Classifier
English en.sentiment.glove.imdb analyze_sentimentdl_glove_imdb Sentiment Classifier (Alias)
English en.classify.sentiment.glove.imdb analyze_sentimentdl_glove_imdb Sentiment Classifier (Alias)
English en.classify.sentiment.glove analyze_sentimentdl_glove_imdb Sentiment Classifier (Alias)
English en.classify.trec50.pipe classifierdl_use_trec50_pipeline Language Classifier
English en.ner.onto.large onto_recognize_entities_electra_large Named Entity Recognizer
English en.classify.questions.atis classifierdl_use_atis Intent Classifier
English en.classify.questions.airline classifierdl_use_atis Intent Classifier (Alias)
English en.classify.intent.atis classifierdl_use_atis Intent Classifier (Alias)
English en.classify.intent.airline classifierdl_use_atis Intent Classifier (Alias)
English en.ner.atis nerdl_atis_840b_300d Aspect based NER
English en.ner.airline nerdl_atis_840b_300d Aspect based NER (Alias)
English en.ner.aspect.airline nerdl_atis_840b_300d Aspect based NER (Alias)
English en.ner.aspect.atis nerdl_atis_840b_300d Aspect based NER (Alias)

New Easy NLU 1-liner Examples :

Extract aspects and entities from airline questions (ATIS dataset)


nlu.load("en.ner.atis").predict("i want to fly from baltimore to dallas round trip")
output:  ["baltimore"," dallas", "round trip"]

Intent Classification for Airline Traffic Information System queries (ATIS dataset)


nlu.load("en.classify.questions.atis").predict("what is the price of flight from newyork to washington")
output:  "atis_airfare" 

Recognize Entities OntoNotes - ELECTRA Large


nlu.load("en.ner.onto.large").predict("Johnson first entered politics when elected in 2001 as a member of Parliament. He then served eight years as the mayor of London.")  
output:  ["Johnson", "first", "2001", "eight years", "London"]  

Question classification of open-domain and fact-based questions Pipeline - TREC50

nlu.load("en.classify.trec50.pipe").predict("When did the construction of stone circles begin in the UK? ")
output:  LOC_other

Traditional Chinese Word Segmentation

# 'However, this treatment also creates some problems' in Chinese
nlu.load("zh.segment_words.gsd").predict("然而,這樣的處理也衍生了一些問題。")
output:  ["然而",",","這樣","的","處理","也","衍生","了","一些","問題","。"]

Part of Speech for Traditional Chinese

# 'However, this treatment also creates some problems' in Chinese
nlu.load("zh.pos.ud_gsd_trad").predict("然而,這樣的處理也衍生了一些問題。")

Output:

Token POS
然而 ADV
PUNCT
這樣 PRON
PART
處理 NOUN
ADV
衍生 VERB
PART
一些 ADJ
問題 NOUN
PUNCT

Thai Word Segment Recognition

# 'Mona Lisa is a 16th-century oil painting created by Leonardo held at the Louvre in Paris' in Thai
nlu.loadnlu.load("th.segment_words").predict("Mona Lisa เป็นภาพวาดสีน้ำมันในศตวรรษที่ 16 ที่สร้างโดย Leonardo จัดขึ้นที่พิพิธภัณฑ์ลูฟร์ในปารีส")

Output:

token
M
o
n
a
Lisa
เป็น
ภาพ
สีน้ำ
มัน
ใน
ศตวรรษ
ที่
16
ที่
สร้าง
L
e
o
n
a
r
d
o
จัด
ขึ้น
ที่
พิพิธภัณฑ์
ลูฟร์
ใน
ปารีส

Part of Speech for Bengali (POS)

# 'The village is also called 'Mod' in Tora language' in Bengali 
nlu.load("bn.pos").predict("বাসস্থান-ঘরগৃহস্থালি তোড়া ভাষায় গ্রামকেও বলে ` মোদ ' ৷")

Output:

token pos
বাসস্থান-ঘরগৃহস্থালি NN
তোড়া NNP
ভাষায় NN
গ্রামকেও NN
বলে VM
` SYM
মোদ NN
' SYM
SYM

Stop Words Cleaner for Bengali

# 'This language is not enough' in Bengali 
df = nlu.load("bn.stopwords").predict("এই ভাষা যথেষ্ট নয়")

Output:

cleanTokens token
ভাষা এই
যথেষ্ট ভাষা
নয় যথেষ্ট
None নয়

Part of Speech for Bengali


# 'The people of Ohu know that the foundation of Bhojpuri was shaken' in Bengali
nlu.load('bh.pos').predict("ओहु लोग के मालूम बा कि श्लील होखते भोजपुरी के नींव हिल जाई").to_markdown()

Output:

pos token
DET ओहु
NOUN लोग
ADP के
NOUN मालूम
VERB बा
SCONJ कि
ADJ श्लील
VERB होखते
PROPN भोजपुरी
ADP के
NOUN नींव
VERB हिल
AUX जाई

Amharic Part of Speech (POS)

# ' "Son, finish the job," he said.' in Amharic
nlu.load('am.pos').predict('ልጅ ኡ ን ሥራ ው ን አስጨርስ ኧው ኣል ኧሁ ።"').to_markdown()

Output:

pos token
NOUN ልጅ
DET
PART
NOUN ሥራ
DET
PART
VERB አስጨርስ
PRON ኧው
AUX ኣል
PRON ኧሁ
PUNCT
NOUN "

Thai Sentiment Classification

#  'I love peanut butter and jelly!' in thai
nlu.load('th.classify.sentiment').predict('ฉันชอบเนยถั่วและเยลลี่!')[['sentiment','sentiment_confidence']].to_markdown()

Output:

sentiment sentiment_confidence
positive 0.999998

Arabic Named Entity Recognition (NER)

# 'In 1918, the forces of the Arab Revolt liberated Damascus with the help of the British' in Arabic
nlu.load('ar.ner').predict('في عام 1918 حررت قوات الثورة العربية دمشق بمساعدة من الإنكليز',output_level='chunk')[['entities_confidence','ner_confidence','entities']].to_markdown()

Output:

entity_class ner_confidence entities
ORG [1.0, 1.0, 1.0, 0.9997000098228455, 0.9840999841690063, 0.9987999796867371, 0.9990000128746033, 0.9998999834060669, 0.9998999834060669, 0.9993000030517578, 0.9998999834060669] قوات الثورة العربية
LOC [1.0, 1.0, 1.0, 0.9997000098228455, 0.9840999841690063, 0.9987999796867371, 0.9990000128746033, 0.9998999834060669, 0.9998999834060669, 0.9993000030517578, 0.9998999834060669] دمشق
PER [1.0, 1.0, 1.0, 0.9997000098228455, 0.9840999841690063, 0.9987999796867371, 0.9990000128746033, 0.9998999834060669, 0.9998999834060669, 0.9993000030517578, 0.9998999834060669] الإنكليز

NLU 1.1.0 Enhancements :

New NLU Notebooks and Tutorials

Installation

# PyPi
!pip install nlu pyspark==2.4.7
#Conda
# Install NLU from Anaconda/Conda
conda install -c johnsnowlabs nlu

Additional NLU ressources