This PR introduces enhancements to the Spark NLP library, focusing on the efficient distribution of ONNX model files across Spark executors. Leveraging Spark's built-in file distribution capabilities, this update aims to optimize the performance and scalability of LLMs within distributed cloud environments.
Motivation and Context
The primary motivation behind this update is to address the challenges associated with deploying and scaling LLMs in cloud-based Spark environments. By utilizing Spark's native support for distributing files across executors, we can significantly enhance the scalability and efficiency of LLM annotators. This is particularly crucial for models like Llama-2 and M2M100, which require access to large ONNX files to function correctly.
This improvement ensures that ONNX models are effectively shared across all nodes in a Spark cluster, reducing the overhead associated with model loading and facilitating faster, more scalable annotations. As a result, users can expect improved performance and a smoother experience when processing large datasets or working in resource-intensive cloud environments.
The integration of these changes represents a significant step forward in our ongoing efforts to optimize Spark NLP for LLM processing, reinforcing our commitment to providing robust, scalable NLP solutions for the cloud.
How Has This Been Tested?
Screenshots (if appropriate):
Local Tests
Google Colab notebooks
Databricks
Types of changes
[x] Bug fix (non-breaking change which fixes an issue)
[x] Code improvements with no or little impact
[ ] New feature (non-breaking change which adds functionality)
[ ] Breaking change (fix or feature that would cause existing functionality to change)
Checklist:
[x] My code follows the code style of this project.
[ ] My change requires a change to the documentation.
Description
This PR introduces enhancements to the Spark NLP library, focusing on the efficient distribution of ONNX model files across Spark executors. Leveraging Spark's built-in file distribution capabilities, this update aims to optimize the performance and scalability of LLMs within distributed cloud environments.
Motivation and Context
The primary motivation behind this update is to address the challenges associated with deploying and scaling LLMs in cloud-based Spark environments. By utilizing Spark's native support for distributing files across executors, we can significantly enhance the scalability and efficiency of LLM annotators. This is particularly crucial for models like Llama-2 and M2M100, which require access to large ONNX files to function correctly.
This improvement ensures that ONNX models are effectively shared across all nodes in a Spark cluster, reducing the overhead associated with model loading and facilitating faster, more scalable annotations. As a result, users can expect improved performance and a smoother experience when processing large datasets or working in resource-intensive cloud environments.
The integration of these changes represents a significant step forward in our ongoing efforts to optimize Spark NLP for LLM processing, reinforcing our commitment to providing robust, scalable NLP solutions for the cloud.
How Has This Been Tested?
Screenshots (if appropriate):
Types of changes
Checklist: