JohnSnowLabs / spark-nlp

State of the Art Natural Language Processing
https://sparknlp.org/
Apache License 2.0
3.88k stars 711 forks source link

Refactor OpenAIEmbeddings #14334

Closed mehmetbutgul closed 4 months ago

mehmetbutgul commented 5 months ago

Refactor OpenAIEmbeddings annotator

Description

Refactor OpenAIEmbeddings 1- Supported escape chars that break the Open AI json content. 2- Changed the output annotator type. DOCUMENT --> SENTENCE_EMBEDDINGS NOTE: This approach is the reverse of the backward compatibility 3- Added metadata which comes from the document column to output embeddings 4- Added Python unit test class 5- Added a new submodule to support saving/loading the annotator NOTE: The new submodule will fix saving/loading the annotator

Motivation and Context

How Has This Been Tested?

Tested via Python and Scala locally. Additionally, I added new unit tests that cover my changes.

Screenshots (if appropriate):

Types of changes

Checklist:

mehmetbutgul commented 5 months ago

I created my branch from the master, So, There are some other commits, too.