JohnSnowLabs / spark-nlp

State of the Art Natural Language Processing
https://sparknlp.org/
Apache License 2.0
3.8k stars 708 forks source link

John Snow Labs Spark-NLP 3.3.2: New BERT for Sequence Classification, Comet.ml logging integration, new state-of-the-art BERT topic and sentiment detection models, and bug fixes! #6411

Closed lchernicharo closed 2 years ago

lchernicharo commented 2 years ago

Hello, there! Trying to download the Fat Jars but getting Permission Denied. Can you help?

Discussed in https://github.com/JohnSnowLabs/spark-nlp/discussions/6383

Originally posted by **maziyarpanahi** November 3, 2021 --------------- Overview --------------- We are pleased to release Spark NLP 🚀 3.3.2! This release comes with a new BertForSequenceClassification annotator for existing or fine-tuned models on HuggingFace, new logging feature during training with Comet.ml, New state-of-the-art fine-tuned BERT models for Sequence Classification, and bug fixes! As always, we would like to thank our community for their feedback, questions, and feature requests. ---------------- New Features ---------------- * Introducing BertForSequenceClassification annotator. BertForSequenceClassification can load BERT Models with a sequence classification/regression head on top (a linear layer on top of the pooled output) e.g. for multi-class document classification tasks. This annotator is compatible with all the models trained/fine-tuned by using BertForSequenceClassification (PyTorch) or TFBertForSequenceClassification (TensorFlow) in HuggingFace 🤗 * New support for Comet.ml in Spark NLP to build better models faster. > Comet enables data scientists and teams to track, compare, explain and optimize experiments and models across the model’s entire lifecycle. From training to production. With just two lines of code, you can start building better models today. [Comet SparkNLP Integration Notebook](https://github.com/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/logging/Comet_SparkNLP_Intergration.ipynb) ---------------- Bug Fixes and Enhancements ---------------- * Fix a missing batchSize param in NerDLModel that degraded GPU performance by not allowing users to change the default batchSize * Fix NerDLApproach logs format on Databricks * Fix EntityRulerApproach name from import * Fix missing EntityRulerModel in ResourceDownloader * Faster Colab setup script for pyspark 3.0.x and 3.1.x on Java 11 -------------------- Models -------------------- New state-of-the-art fine-tuned BERT models for Sequence Classification in English, French, German, Spanish, Japanese, Turkish, Russian, and multilingual languages. ### Featured Pretrained Models | Model | Name | Build | Lang | |:---------------------|:-------------------|:-----------------|:-----| | BertForSequenceClassification | [bert_multilingual_sequence_classifier_allocine](https://nlp.johnsnowlabs.com/2021/11/01/bert_multilingual_sequence_classifier_allocine_fr.html)| `3.3.2` | `fr`| | BertForSequenceClassification | [bert_large_sequence_classifier_imdb](https://nlp.johnsnowlabs.com/2021/11/01/bert_large_sequence_classifier_imdb_en.html)| `3.3.2` | `en`| | BertForSequenceClassification | [bert_base_sequence_classifier_imdb](https://nlp.johnsnowlabs.com/2021/11/01/bert_base_sequence_classifier_imdb_en.html)| `3.3.2` | `en`| | BertForSequenceClassification | [bert_base_sequence_classifier_ag_news](https://nlp.johnsnowlabs.com/2021/11/02/bert_base_sequence_classifier_ag_news_en.html)| `3.3.2` | `en`| | BertForSequenceClassification | [bert_base_sequence_classifier_dbpedia_14](https://nlp.johnsnowlabs.com/2021/11/01/bert_base_sequence_classifier_dbpedia_14_en.html)| `3.3.2` | `en`| | BertForSequenceClassification | [bert_sequence_classifier_turkish_sentiment](https://nlp.johnsnowlabs.com/2021/11/03/bert_sequence_classifier_turkish_sentiment_tr.html)| `3.3.2` | `tr`| | BertForSequenceClassification | [bert_sequence_classifier_sentiment](https://nlp.johnsnowlabs.com/2021/11/03/bert_sequence_classifier_sentiment_de.html)| `3.3.2` | `de`| | BertForSequenceClassification | [bert_sequence_classifier_rubert_sentiment](https://nlp.johnsnowlabs.com/2021/11/03/bert_sequence_classifier_rubert_sentiment_ru.html)| `3.3.2` | `ru`| | BertForSequenceClassification | [bert_sequence_classifier_multilingual_sentiment](https://nlp.johnsnowlabs.com/2021/11/03/bert_sequence_classifier_multilingual_sentiment_xx.html)| `3.3.2` | `xx`| | BertForSequenceClassification | [bert_sequence_classifier_japanese_sentiment](https://nlp.johnsnowlabs.com/2021/11/03/bert_sequence_classifier_japanese_sentiment_ja.html)| `3.3.2` | `ja`| | BertForSequenceClassification | [bert_sequence_classifier_finbert](https://nlp.johnsnowlabs.com/2021/11/03/bert_sequence_classifier_finbert_en.html)| `3.3.2` | `en`| | BertForSequenceClassification | [bert_sequence_classifier_finbert](https://nlp.johnsnowlabs.com/2021/11/03/bert_sequence_classifier_finbert_en.html)| `3.3.2` | `en`| | BertForSequenceClassification | [bert_sequence_classifier_dehatebert_mono](https://nlp.johnsnowlabs.com/2021/11/03/bert_sequence_classifier_dehatebert_mono_en.html)| `3.3.2` | `en`| | BertForSequenceClassification | [bert_sequence_classifier_beto_sentiment_analysis](https://nlp.johnsnowlabs.com/2021/11/03/bert_sequence_classifier_beto_sentiment_analysis_es.html)| `3.3.2` | `es`| | BertForSequenceClassification | [bert_sequence_classifier_beto_emotion_analysis](https://nlp.johnsnowlabs.com/2021/11/03/bert_sequence_classifier_beto_emotion_analysis_es.html)| `3.3.2` | `es`| The complete list of all 4000+ models & pipelines in 200+ languages is available on [Models Hub](https://nlp.johnsnowlabs.com/models?edition=Spark+NLP). ### New Notebooks Spark NLP | Notebooks | Colab :------------ | :-------------| :----------| BertForSequenceClassification |[HuggingFace in Spark NLP - BertForSequenceClassification](https://github.com/JohnSnowLabs/spark-nlp-workshop/blob/master/jupyter/transformers/HuggingFace%20in%20Spark%20NLP%20-%20BertForSequenceClassification.ipynb) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/jupyter/transformers/HuggingFace%20in%20Spark%20NLP%20-%20BertForSequenceClassification.ipynb)| Comet.ml | [Comet SparkNLP Integration Notebook](https://github.com/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/logging/Comet_SparkNLP_Intergration.ipynb)| [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/logging/Comet_SparkNLP_Intergration.ipynb) ---------------- Documentation ---------------- * [TF Hub & HuggingFace to Spark NLP](https://github.com/JohnSnowLabs/spark-nlp/discussions/5669) * [Models Hub](https://nlp.johnsnowlabs.com/models) with new models * [Spark NLP documentation](https://nlp.johnsnowlabs.com/docs/en/quickstart) * [Spark NLP Scala APIs](https://nlp.johnsnowlabs.com/api) * [Spark NLP Python APIs](https://nlp.johnsnowlabs.com/api/python) * [Spark NLP Workshop](https://github.com/JohnSnowLabs/spark-nlp-workshop) notebooks * [Spark NLP publications](https://medium.com/spark-nlp) * [Spark NLP in Action](https://nlp.johnsnowlabs.com/demo) * [Spark NLP training certification notebooks](https://github.com/JohnSnowLabs/spark-nlp-workshop/tree/master/tutorials/Certification_Trainings/Public) for Google Colab and Databricks * [Spark NLP Display](https://github.com/JohnSnowLabs/spark-nlp-display) for visualization of different types of annotations * [Discussions](https://github.com/JohnSnowLabs/spark-nlp/discussions) Engage with other community members, share ideas, and show off how you use Spark NLP! --------------- Installation --------------- **Python** ```shell #PyPI pip install spark-nlp==3.3.2 ``` **Spark Packages** **spark-nlp** on Apache Spark 3.0.x and 3.1.x (Scala 2.12 only): ```shell spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.12:3.3.2 pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:3.3.2 ``` **GPU** ```shell spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:3.3.2 pyspark --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:3.3.2 ``` **spark-nlp** on Apache Spark 2.4.x (Scala 2.11 only): ```shell spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-spark24_2.11:3.3.2 pyspark --packages com.johnsnowlabs.nlp:spark-nlp-spark24_2.11:3.3.2 ``` **GPU** ```shell spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-gpu-spark24_2.11:3.3.2 pyspark --packages com.johnsnowlabs.nlp:spark-nlp-gpu-spark24_2.11:3.3.2 ``` **spark-nlp** on Apache Spark 2.3.x (Scala 2.11 only): ```shell spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-spark23_2.11:3.3.2 pyspark --packages com.johnsnowlabs.nlp:spark-nlp-spark23_2.11:3.3.2 ``` **GPU** ```shell spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-spark23-gpu_2.11:3.3.2 pyspark --packages com.johnsnowlabs.nlp:spark-nlp-gpu-spark23_2.11:3.3.2 ``` **Maven** **spark-nlp** on Apache Spark 3.0.x and 3.1.x: ```xml com.johnsnowlabs.nlp spark-nlp_2.12 3.3.2 ``` **spark-nlp-gpu:** ```xml com.johnsnowlabs.nlp spark-nlp-gpu_2.12 3.3.2 ``` **spark-nlp** on Apache Spark 2.4.x: ```xml com.johnsnowlabs.nlp spark-nlp-spark24_2.11 3.3.2 ``` **spark-nlp-gpu:** ```xml com.johnsnowlabs.nlp spark-nlp-gpu-spark24_2.11 3.3.2 ``` **spark-nlp** on Apache Spark 2.3.x: ```xml com.johnsnowlabs.nlp spark-nlp-spark23_2.11 3.3.2 ``` **spark-nlp-gpu:** ```xml com.johnsnowlabs.nlp spark-nlp-gpu-spark23_2.11 3.3.2 ``` **FAT JARs** * CPU on Apache Spark 3.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-assembly-3.3.2.jar * GPU on Apache Spark 3.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-gpu-assembly-3.3.2.jar * CPU on Apache Spark 2.4.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-spark24-assembly-3.3.2.jar * GPU on Apache Spark 2.4.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-gpu-spark24-assembly-3.3.2.jar * CPU on Apache Spark 2.3.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-spark23-assembly-3.3.2.jar * GPU on Apache Spark 2.3.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-gpu-spark23-assembly-3.3.2.jar
This discussion was created from the release John Snow Labs Spark-NLP 3.3.2: New BERT for Sequence Classification, Comet.ml logging integration, new state-of-the-art BERT topic and sentiment detection models, and bug fixes!.
maziyarpanahi commented 2 years ago

Hi @lchernicharo

good find! I totally forgot to upload them! I’ll do it within an hour

maziyarpanahi commented 2 years ago

They are now available on our S3 for downloads.

Thanks again @lchernicharo for reporting this 👍🏼