JohnSnowLabs / spark-nlp

State of the Art Natural Language Processing
https://sparknlp.org/
Apache License 2.0
3.77k stars 705 forks source link

EntityRuler fails two basic tests #14187

Open jfernandrezj opened 5 months ago

jfernandrezj commented 5 months ago

Is there an existing issue for this?

Who can help?

@danilojsl @maziyarpanahi

What are you working on?

Matching Keyword Patterns from a list of known keywords

Current Behavior

Both current behavior and desired behavior is documented in the branch with the failing tests: https://github.com/JohnSnowLabs/spark-nlp/tree/issues/aho-corasick-failing-tests

Expected Behavior

Both current behavior and desired behavior is documented in the branch with the failing tests: https://github.com/JohnSnowLabs/spark-nlp/tree/issues/aho-corasick-failing-tests

Steps To Reproduce

Just run the added tests in the branch: https://github.com/JohnSnowLabs/spark-nlp/tree/issues/aho-corasick-failing-tests

Spark NLP version and Apache Spark

Spark 3.4 Spark NLP 5.2.2

Type of Spark Application

No response

Java Version

Java 11

Java Home Directory

No response

Setup and installation

No response

Operating System and Version

No response

Link to your project (if available)

No response

Additional Information

No response

maziyarpanahi commented 5 months ago

@jfernandrezj do you have a PR/fix for this issue?

jfernandrezj commented 5 months ago

A branch containing a potential fix for this issue is in: https://github.com/JohnSnowLabs/spark-nlp/tree/issues/aho-corasick-failing-tests-fix-for-discussion Please check it out and I can create a PR once we are ok

danilojsl commented 5 months ago

Hi @jfernandrezj

LGFM I think you can create a PR for this