Closed AntoineF3006 closed 1 month ago
Hello @maziyarpanahi, is my issue complete enough or do I need to add some more context or data in order to discuss the subject ? Kind regards,
This issue is stale because it has been open 180 days with no activity. Remove stale label or comment or this will be closed in 5 days
Is there an existing issue for this?
Who can help?
No response
What are you working on?
I am currently working on a multi-output classification task, in order to classify some customers comments into several cateogories. I am using MultiClassifierDLApproach for this task, with already labeled data for training. I followed this tutorial : https://www.johnsnowlabs.com/mastering-text-classification-with-spark-nlp.
Current Behavior
After fitting my pipeline (described below) on my train set, I am transforming my train and test sets with said pipeline. The results are pretty good, but on some rows the column category is empty and I don't have any calculated probabilities for any category.
Expected Behavior
I was expecting every row to get the probabilities for every category : maybe not selected categories since I have put a treshold at 0.5, but at least the values for each category.
Steps To Reproduce
https://drive.google.com/file/d/1tmJYwZKBVZoHtLcuyWtWhsu6nbonKG-S/view?usp=sharing
On this zip you will find a .ipynb recreating the steps I used to create my pipeline, some sample data and their results, and said pipeline already fitted. The input column is texte_sw, the label is niveau_2_MC, the output is category. The issue seems to happen uniformly on my data, the time and date, the length or the number of words doesn't seem to be the problem.
Spark NLP version and Apache Spark
sparknlp.version() : 5.2.3 spark.version : 3.2.0.3.2.7170.1008-2'
Type of Spark Application
Python Application
Java Version
No response
Java Home Directory
No response
Setup and installation
No response
Operating System and Version
No response
Link to your project (if available)
No response
Additional Information
No response