The parameter set to False does not matter as it always returns by first month from the input
Please see example "I was born at 01/03/98" which is indented to be 1st of March of 1998.
Expected Behavior
To read my example 01/03/1998 by not the month first
Steps To Reproduce
import sparknlp
from sparknlp.annotator import DocumentAssembler, DateMatcher, MultiDateMatcher
from pyspark.sql.types import StringType
from pyspark.ml import Pipeline
spark = sparknlp.start()
spark
documentAssembler = DocumentAssembler() \
.setInputCol("text") \
.setOutputCol("document")
date = DateMatcher() \
.setInputCols("document") \
.setOutputCol("date") \
.setReadMonthFirst(False) \
.setOutputFormat("dd/MM/yyyy")
multiDate = MultiDateMatcher() \
.setInputCols("document") \
.setReadMonthFirst(False) \
.setOutputCol("multi_date") \
.setOutputFormat("dd/MM/yyyy")
pipeline = Pipeline().setStages([
documentAssembler,
date,
multiDate
])
text_list = ["See you on next monday.",
"I was born at 01/03/98",
"She was born on 02/03/1966.",
"The project started yesterday and will finish next year.",
"She will graduate by July 2023.",
"She will visit doctor tomorrow and next month again."]
spark_df = spark.createDataFrame(text_list, StringType()).toDF("text")
result = pipeline.fit(spark_df).transform(spark_df)
result.selectExpr("text","date.result as date", "multi_date.result as multi_date").show(truncate=False)
Is there an existing issue for this?
Who can help?
No response
What are you working on?
I am using the example provider by spark nlp and customize the methods and I am trying to set to not read the month first
Current Behavior
The parameter set to False does not matter as it always returns by first month from the input
Please see example "I was born at 01/03/98" which is indented to be 1st of March of 1998.
Expected Behavior
To read my example 01/03/1998 by not the month first
Steps To Reproduce
Spark NLP version and Apache Spark
spark-nlp==5.2.0
Type of Spark Application
Python Application
Java Version
openjdk version "11.0.21" 2023-10-17
Java Home Directory
/usr/lib/jvm/java-11-openjdk-amd64
Setup and installation
pip install numpy py4j pyspark spark-nlp
Operating System and Version
Ubuntu-22.04
Link to your project (if available)
No response
Additional Information
No response