JohnSnowLabs / nlu

1 line for thousands of State of The Art NLP models in hundreds of languages The fastest and most accurate way to solve text problems.
Apache License 2.0
839 stars 129 forks source link

Release/500 #191

Closed C-K-Loan closed 11 months ago

C-K-Loan commented 11 months ago

New Annotators:

Medical Text Generation, ConvNext for image Classification and DistilBert,Bert,Roberta for Zero-Shot Classification in John Snow Labs NLU 5.0.0

We are very excited to announce NLU 5.0.0 has been released!

It comes with ZeroShotClassification models based on Bert, DistilBert, and Roberta architectures. Additionally Medical Text Generator based on Bio-GPT as-well as a Bart based General Text Generator are now available in NLU. Finally, ConvNextForImageClassification is an image classifier based on ConvNet models.


ConvNextForImageClassification

Tutorial Notebook
ConvNextForImageClassification is an image classifier based on ConvNet models. The ConvNeXT model was proposed in A ConvNet for the 2020s by Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie. ConvNeXT is a pure convolutional model (ConvNet), inspired by the design of Vision Transformers, that claims to outperform them.
Powered by ConvNextForImageClassification
Reference: A ConvNet for the 2020s

New NLU Models:

Language NLU Reference Spark NLP Reference Task Annotator Class
en en.classify_image.convnext.tiny image_classifier_convnext_tiny_224_local Image Classification ConvNextImageClassifier
en en.classify_image.convnext.tiny image_classifier_convnext_tiny_224_local Image Classification ConvNextImageClassifier

DistilBertForZeroShotClassification

Tutorial Notebook

DistilBertForZeroShotClassification using a ModelForSequenceClassification trained on NLI (natural language inference) tasks. Any combination of sequences and labels can be passed and each combination will be posed as a premise/hypothesis pair and passed to the pretrained model. Powered by DistilBertForZeroShotClassification

New NLU Models:

Language NLU Reference Spark NLP Reference Task Annotator Class
en en.distilbert.zero_shot_classifier distilbert_base_zero_shot_classifier_uncased_mnli Zero-Shot Classification DistilBertForZeroShotClassification
tr tr.distilbert.zero_shot_classifier.multinli distilbert_base_zero_shot_classifier_turkish_cased_multinli Zero-Shot Classification DistilBertForZeroShotClassification
tr tr.distilbert.zero_shot_classifier.allnli distilbert_base_zero_shot_classifier_turkish_cased_allnli Zero-Shot Classification DistilBertForZeroShotClassification
tr tr.distilbert.zero_shot_classifier.snli distilbert_base_zero_shot_classifier_turkish_cased_snli Zero-Shot Classification DistilBertForZeroShotClassification

BertForZeroShotClassification

Tutorial Notebook
BertForZeroShotClassification using a ModelForSequenceClassification trained on NLI (natural language inference) tasks.
Any combination of sequences and labels can be passed and each combination will be posed as a premise/hypothesis pair and passed to the pretrained model.
Powered by BertForZeroShotClassification

New NLU Models:

Language NLU Reference Spark NLP Reference Task Annotator Class
en en.bert.zero_shot_classifier bert_base_cased_zero_shot_classifier_xnli Zero-Shot Classification BertForZeroShotClassification

RoBertaForZeroShotClassification

Tutorial Notebook
RoBertaForZeroShotClassification using a ModelForSequenceClassification trained on NLI (natural language inference) tasks. Any combination of sequences and labels can be passed and each combination will be posed as a premise/hypothesis pair and passed to the pretrained model.
Powered by RoBertaForZeroShotClassification

New NLU Models:

Language NLU Reference Spark NLP Reference Task Annotator Class
en en.roberta.zero_shot_classifier roberta_base_zero_shot_classifier_nli Zero-Shot Classification RoBertaForZeroShotClassification

BartTransformer

Tutorial Notebook

The Facebook BART (Bidirectional and Auto-Regressive Transformer) model is a state-of-the-art language generation model that was introduced by Facebook AI in 2019. It is based on the transformer architecture and is designed to handle a wide range of natural language processing tasks such as text generation, summarization, and machine translation. BART is unique in that it is both bidirectional and auto-regressive, meaning that it can generate text both from left-to-right and from right-to-left. This allows it to capture contextual information from both past and future tokens in a sentence,resulting in more accurate and natural language generation. The model was trained on a large corpus of text data using a combination of unsupervised and supervised learning techniques. It incorporates pretraining and fine-tuning phases, where the model is first trained on a large unlabeled corpus of text, and then fine-tuned on specific downstream tasks. BART has achieved state-of-the-art performance on a wide range of NLP tasks, including summarization, question-answering, and language translation. Its ability to handle multiple tasks and its high performance on each of these tasks make it a versatile and valuable tool for natural language processing applications.
Powered by BartTransformer
Reference : BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension

New NLU Models:

Language NLU Reference Spark NLP Reference Task Annotator Class
en en.seq2seq.distilbart_xsum_12_6 distilbart_xsum_12_6 Summarization BartTransformer
en en.seq2seq.distilbart_xsum_12_6 distilbart_xsum_12_6 Summarization BartTransformer
en en.seq2seq.distilbart_xsum_12_6 distilbart_xsum_12_6 Summarization BartTransformer
en en.seq2seq.distilbart_xsum_12_6 distilbart_xsum_12_6 Summarization BartTransformer
en en.seq2seq.bart_large_cnn bart_large_cnn Summarization BartTransformer
en en.seq2seq.bart_large_cnn bart_large_cnn Summarization BartTransformer
en en.seq2seq.bart_large_cnn bart_large_cnn Summarization BartTransformer
en en.seq2seq.distilbart_cnn_6_6 distilbart_cnn_6_6 Summarization BartTransformer
en en.seq2seq.distilbart_cnn_6_6 distilbart_cnn_6_6 Summarization BartTransformer
en en.seq2seq.distilbart_cnn_6_6 distilbart_cnn_6_6 Summarization BartTransformer
en en.seq2seq.distilbart_cnn_12_6 distilbart_cnn_12_6 Summarization BartTransformer
en en.seq2seq.distilbart_cnn_12_6 distilbart_cnn_12_6 Summarization BartTransformer
en en.seq2seq.distilbart_xsum_6_6 distilbart_xsum_6_6 Summarization BartTransformer
en en.seq2seq.distilbart_xsum_6_6 distilbart_xsum_6_6 Summarization BartTransformer

MedicalTextGenerator

Tutorial Notebook

MedicalTextGenerator uses the basic BioGPT model to perform various tasks related to medical text abstraction.
A user can provide a prompt and context and instruct the system to perform a specific task, such as explaining why a patient may have a particular disease or paraphrasing the context more directly.
In addition, this annotator can create a clinical note for a cancer patient using the given keywords or write medical texts based on introductory sentences.
The BioGPT model is trained on large volumes of medical data allowing it to identify and extract the most relevant information from the text provided.
Powered by TextGenerator

New NLU Models:

Language NLU Reference Spark NLP Reference Task Annotator Class
en en.generate.biomedical_biogpt_base text_generator_biomedical_biogpt_base Text Generation MedicalTextGenerator
en en.generate.generic_flan_base text_generator_generic_flan_base Text Generation MedicalTextGenerator
en en.generate.generic_jsl_base text_generator_generic_jsl_base Text Generation MedicalTextGenerator
en en.generate.generic_flan_t5_large text_generator_generic_flan_t5_large Text Generation MedicalTextGenerator
en en.generate.biogpt_chat_jsl biogpt_chat_jsl Text Generation MedicalTextGenerator
en en.generate.biogpt_chat_jsl_conversational biogpt_chat_jsl_conversational Text Generation MedicalTextGenerator
en en.generate.biogpt_chat_jsl_conditions biogpt_chat_jsl_conditions Text Generation MedicalTextGenerator

Install NLU

pip install nlu pyspark

Additional NLU resources