Sentence embeddings using Universal AnglE Embedding (UAE).
UAE is a novel angle-optimized text embedding model, designed to improve semantic textual
similarity tasks, which are crucial for Large Language Model (LLM) applications. By
introducing angle optimization in a complex space, AnglE effectively mitigates saturation of
the cosine similarity function.
https://github.com/JohnSnowLabs/spark-nlp/pull/14224
1 - gets3Object that includes getLastModified() (just contains a summary, do not download the whole metadata.json file.)
2- check the condition (cache contains up-to-date metadata)
3- If the cache contains up-to-date metadata, get it;
Otherwise, download it, set it to the cache, and return it.
https://github.com/JohnSnowLabs/spark-nlp/pull/14225
This PR introduces critical enhancements and optimizations to the processing of the CoNLL-U format, which is instrumental in the training of Dependency Parsers. The key improvements include:
Enhanced Multiword Token Handling: This update ensures proper processing of lines identified by id columns as multiword tokens (e.g., 2-3 no ). This adjustment guarantees that multiword tokens are accurately recognized and managed throughout the parsing process.
Improved Handling of Missing uPos Values: Before this change, lines with unavailable uPos values could disrupt the parsing flow. With the current enhancements, the system gracefully handles such scenarios, ensuring uninterrupted parsing operations even in the absence of uPos values.
Beyond these functional enhancements, this PR undertakes a comprehensive refactoring of the underlying codebase. The refactoring efforts focus on enhancing code readability, cleanliness, and maintainability. These improvements pave the way for easier future modifications and debugging, aligning with best practices in software development.
https://github.com/JohnSnowLabs/spark-nlp/pull/14224 1 - gets3Object that includes getLastModified() (just contains a summary, do not download the whole metadata.json file.) 2- check the condition (cache contains up-to-date metadata) 3- If the cache contains up-to-date metadata, get it; Otherwise, download it, set it to the cache, and return it.
https://github.com/JohnSnowLabs/spark-nlp/pull/14225 This PR introduces critical enhancements and optimizations to the processing of the CoNLL-U format, which is instrumental in the training of Dependency Parsers. The key improvements include:
Enhanced Multiword Token Handling: This update ensures proper processing of lines identified by id columns as multiword tokens (e.g., 2-3 no ). This adjustment guarantees that multiword tokens are accurately recognized and managed throughout the parsing process.
Improved Handling of Missing uPos Values: Before this change, lines with unavailable uPos values could disrupt the parsing flow. With the current enhancements, the system gracefully handles such scenarios, ensuring uninterrupted parsing operations even in the absence of uPos values.
Beyond these functional enhancements, this PR undertakes a comprehensive refactoring of the underlying codebase. The refactoring efforts focus on enhancing code readability, cleanliness, and maintainability. These improvements pave the way for easier future modifications and debugging, aligning with best practices in software development.