Release/500 - Githubissues

New Annotators:

Medical Text Generation, ConvNext for image Classification and DistilBert,Bert,Roberta for Zero-Shot Classification in John Snow Labs NLU 5.0.0

We are very excited to announce NLU 5.0.0 has been released!

It comes with ZeroShotClassification models based on Bert, DistilBert, and Roberta architectures. Additionally Medical Text Generator based on Bio-GPT as-well as a Bart based General Text Generator are now available in NLU. Finally, ConvNextForImageClassification is an image classifier based on ConvNet models.

ConvNextForImageClassification

Tutorial Notebook
ConvNextForImageClassification is an image classifier based on ConvNet models. The ConvNeXT model was proposed in A ConvNet for the 2020s by Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie. ConvNeXT is a pure convolutional model (ConvNet), inspired by the design of Vision Transformers, that claims to outperform them.
Powered by ConvNextForImageClassification
Reference: A ConvNet for the 2020s

New NLU Models:

Language	NLU Reference	Spark NLP Reference	Task	Annotator Class
en	en.classify_image.convnext.tiny	image_classifier_convnext_tiny_224_local	Image Classification	ConvNextImageClassifier
en	en.classify_image.convnext.tiny	image_classifier_convnext_tiny_224_local	Image Classification	ConvNextImageClassifier

DistilBertForZeroShotClassification

Tutorial Notebook

DistilBertForZeroShotClassification using a ModelForSequenceClassification trained on NLI (natural language inference) tasks. Any combination of sequences and labels can be passed and each combination will be posed as a premise/hypothesis pair and passed to the pretrained model. Powered by DistilBertForZeroShotClassification

New NLU Models:

Language	NLU Reference	Spark NLP Reference	Task	Annotator Class
en	en.distilbert.zero_shot_classifier	distilbert_base_zero_shot_classifier_uncased_mnli	Zero-Shot Classification	DistilBertForZeroShotClassification
tr	tr.distilbert.zero_shot_classifier.multinli	distilbert_base_zero_shot_classifier_turkish_cased_multinli	Zero-Shot Classification	DistilBertForZeroShotClassification
tr	tr.distilbert.zero_shot_classifier.allnli	distilbert_base_zero_shot_classifier_turkish_cased_allnli	Zero-Shot Classification	DistilBertForZeroShotClassification
tr	tr.distilbert.zero_shot_classifier.snli	distilbert_base_zero_shot_classifier_turkish_cased_snli	Zero-Shot Classification	DistilBertForZeroShotClassification

BertForZeroShotClassification

Tutorial Notebook
BertForZeroShotClassification using a ModelForSequenceClassification trained on NLI (natural language inference) tasks.
Any combination of sequences and labels can be passed and each combination will be posed as a premise/hypothesis pair and passed to the pretrained model.
Powered by BertForZeroShotClassification

New NLU Models:

Language	NLU Reference	Spark NLP Reference	Task	Annotator Class
en	en.bert.zero_shot_classifier	bert_base_cased_zero_shot_classifier_xnli	Zero-Shot Classification	BertForZeroShotClassification

RoBertaForZeroShotClassification

Tutorial Notebook
RoBertaForZeroShotClassification using a ModelForSequenceClassification trained on NLI (natural language inference) tasks. Any combination of sequences and labels can be passed and each combination will be posed as a premise/hypothesis pair and passed to the pretrained model.
Powered by RoBertaForZeroShotClassification

New NLU Models:

Language	NLU Reference	Spark NLP Reference	Task	Annotator Class
en	en.roberta.zero_shot_classifier	roberta_base_zero_shot_classifier_nli	Zero-Shot Classification	RoBertaForZeroShotClassification

BartTransformer

Tutorial Notebook

The Facebook BART (Bidirectional and Auto-Regressive Transformer) model is a state-of-the-art language generation model that was introduced by Facebook AI in 2019. It is based on the transformer architecture and is designed to handle a wide range of natural language processing tasks such as text generation, summarization, and machine translation. BART is unique in that it is both bidirectional and auto-regressive, meaning that it can generate text both from left-to-right and from right-to-left. This allows it to capture contextual information from both past and future tokens in a sentence,resulting in more accurate and natural language generation. The model was trained on a large corpus of text data using a combination of unsupervised and supervised learning techniques. It incorporates pretraining and fine-tuning phases, where the model is first trained on a large unlabeled corpus of text, and then fine-tuned on specific downstream tasks. BART has achieved state-of-the-art performance on a wide range of NLP tasks, including summarization, question-answering, and language translation. Its ability to handle multiple tasks and its high performance on each of these tasks make it a versatile and valuable tool for natural language processing applications.
Powered by BartTransformer
Reference : BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension

New NLU Models:

Language	NLU Reference	Spark NLP Reference	Task	Annotator Class
en	en.seq2seq.distilbart_xsum_12_6	distilbart_xsum_12_6	Summarization	BartTransformer
en	en.seq2seq.distilbart_xsum_12_6	distilbart_xsum_12_6	Summarization	BartTransformer
en	en.seq2seq.distilbart_xsum_12_6	distilbart_xsum_12_6	Summarization	BartTransformer
en	en.seq2seq.distilbart_xsum_12_6	distilbart_xsum_12_6	Summarization	BartTransformer
en	en.seq2seq.bart_large_cnn	bart_large_cnn	Summarization	BartTransformer
en	en.seq2seq.bart_large_cnn	bart_large_cnn	Summarization	BartTransformer
en	en.seq2seq.bart_large_cnn	bart_large_cnn	Summarization	BartTransformer
en	en.seq2seq.distilbart_cnn_6_6	distilbart_cnn_6_6	Summarization	BartTransformer
en	en.seq2seq.distilbart_cnn_6_6	distilbart_cnn_6_6	Summarization	BartTransformer
en	en.seq2seq.distilbart_cnn_6_6	distilbart_cnn_6_6	Summarization	BartTransformer
en	en.seq2seq.distilbart_cnn_12_6	distilbart_cnn_12_6	Summarization	BartTransformer
en	en.seq2seq.distilbart_cnn_12_6	distilbart_cnn_12_6	Summarization	BartTransformer
en	en.seq2seq.distilbart_xsum_6_6	distilbart_xsum_6_6	Summarization	BartTransformer
en	en.seq2seq.distilbart_xsum_6_6	distilbart_xsum_6_6	Summarization	BartTransformer

MedicalTextGenerator

Tutorial Notebook

MedicalTextGenerator uses the basic BioGPT model to perform various tasks related to medical text abstraction.
A user can provide a prompt and context and instruct the system to perform a specific task, such as explaining why a patient may have a particular disease or paraphrasing the context more directly.
In addition, this annotator can create a clinical note for a cancer patient using the given keywords or write medical texts based on introductory sentences.
The BioGPT model is trained on large volumes of medical data allowing it to identify and extract the most relevant information from the text provided.
Powered by TextGenerator

New NLU Models:

Language	NLU Reference	Spark NLP Reference	Task	Annotator Class
en	en.generate.biomedical_biogpt_base	text_generator_biomedical_biogpt_base	Text Generation	MedicalTextGenerator
en	en.generate.generic_flan_base	text_generator_generic_flan_base	Text Generation	MedicalTextGenerator
en	en.generate.generic_jsl_base	text_generator_generic_jsl_base	Text Generation	MedicalTextGenerator
en	en.generate.generic_flan_t5_large	text_generator_generic_flan_t5_large	Text Generation	MedicalTextGenerator
en	en.generate.biogpt_chat_jsl	biogpt_chat_jsl	Text Generation	MedicalTextGenerator
en	en.generate.biogpt_chat_jsl_conversational	biogpt_chat_jsl_conversational	Text Generation	MedicalTextGenerator
en	en.generate.biogpt_chat_jsl_conditions	biogpt_chat_jsl_conditions	Text Generation	MedicalTextGenerator

Install NLU

pip install nlu pyspark

Additional NLU resources

140+ NLU Tutorials
NLU in Action
Streamlit visualizations docs
The complete list of all 20000+ models & pipelines in 200+ languages is available on Models Hub.
Spark NLP publications
NLU documentation
Discussions Engage with other community members, share ideas, and show off how you use Spark NLP and NLU!

JohnSnowLabs / nlu

Release/500 #191

New Annotators:

ConvNextForImageClassification

DistilBertForZeroShotClassification

BertForZeroShotClassification

RoBertaForZeroShotClassification

BartTransformer

MedicalTextGenerator

Install NLU

Additional NLU resources