JohnSnowLabs / spark-nlp

State of the Art Natural Language Processing
https://sparknlp.org/
Apache License 2.0
3.77k stars 705 forks source link

[SPARKNLP-1031] Solves Dependency Parsers training issue #14225

Closed danilojsl closed 3 months ago

danilojsl commented 3 months ago

Description

This PR introduces critical enhancements and optimizations to the processing of the CoNLL-U format, which is instrumental in the training of Dependency Parsers. The key improvements include:

Beyond these functional enhancements, this PR undertakes a comprehensive refactoring of the underlying codebase. The refactoring efforts focus on enhancing code readability, cleanliness, and maintainability. These improvements pave the way for easier future modifications and debugging, aligning with best practices in software development.

Motivation and Context

Solves issue #14214

How Has This Been Tested?

Screenshots (if appropriate):

Types of changes

Checklist: