JohnSnowLabs / spark-nlp

State of the Art Natural Language Processing
https://sparknlp.org/
Apache License 2.0
3.77k stars 705 forks source link

release/533-release-candidate #14227

Closed maziyarpanahi closed 3 months ago

maziyarpanahi commented 3 months ago

Additionally, it fixes a bug with serializing onnx models that do not have a .onnx_data file (https://github.com/JohnSnowLabs/spark-nlp/commit/b73dc0b1ecdb49af9f2fa6e47b0af23d47442a53). @prabod I think you worked on this part, could you review if the fix looks good? I provided a description in the commit message. Thanks!

Enhanced Multiword Token Handling: This update ensures proper processing of lines identified by id columns as multiword tokens (e.g., 2-3 no ). This adjustment guarantees that multiword tokens are accurately recognized and managed throughout the parsing process.

Improved Handling of Missing uPos Values: Before this change, lines with unavailable uPos values could disrupt the parsing flow. With the current enhancements, the system gracefully handles such scenarios, ensuring uninterrupted parsing operations even in the absence of uPos values.

Beyond these functional enhancements, this PR undertakes a comprehensive refactoring of the underlying codebase. The refactoring efforts focus on enhancing code readability, cleanliness, and maintainability. These improvements pave the way for easier future modifications and debugging, aligning with best practices in software development.