JohnSnowLabs / spark-nlp

State of the Art Natural Language Processing
https://sparknlp.org/
Apache License 2.0
3.87k stars 711 forks source link

[SPARKNLP-1091] AutoGGUFModel embeddings support #14433

Open DevinTDHa opened 1 month ago

DevinTDHa commented 1 month ago

Description

This PR enables proper embedding support for AutoGGUFModels with a new annotator called AutoGGUFEmbeddings. The returned annotations will then contain an embedding vector, similar to the other sentence embedding annotators.

This PR also contains an end-to-end example notebook: https://github.com/JohnSnowLabs/spark-nlp/blob/b59a339164d2a2c37633e2c9ec12762134c5c2c6/examples/python/llama.cpp/llama.cpp_in_Spark_NLP_AutoGGUFEmbeddings.ipynb

The pretrained model is available at https://github.com/JohnSnowLabs/spark-nlp/pull/14448

How Has This Been Tested?

Old and new tests passing on Scala and python side.

Types of changes

maziyarpanahi commented 4 weeks ago

@DevinTDHa let's make the changes and have this feature as AutoGGUFEmbeddings annotator instead of merging this and then reverting it back.

DevinTDHa commented 4 weeks ago

@DevinTDHa let's make the changes and have this feature as AutoGGUFEmbeddings annotator instead of merging this and then reverting it back.

Hi @maziyarpanahi, Sounds good to me! I will update this PR to inlude the new annotator.

DevinTDHa commented 2 weeks ago

@maziyarpanahi I have updated this PR to include the functionaltiy as a new Annotator AutoGGUFEmbeddings