Closed DevinTDHa closed 3 months ago
Hi @DevinTDHa
Regarding the fix in onnx serialization, is it related to this issue: https://github.com/JohnSnowLabs/spark-nlp/issues/14194 (https://colab.research.google.com/drive/119u6hXoT1PRB9F38InuEV-bm4g1uu9UH?usp=sharing)
Hi @DevinTDHa
Regarding the fix in onnx serialization, is it related to this issue: https://github.com/JohnSnowLabs/spark-nlp/issues/14194 (https://colab.research.google.com/drive/119u6hXoT1PRB9F38InuEV-bm4g1uu9UH?usp=sharing)
Hi @maziyarpanahi,
Yes, the fix should prevent the error in the notebook as well.
Description
This PR adds an Annotator for UAE embeddings. For this, new pooling operations for word embeddings have been added.
Namely poooling by
[CLS]
token, or the last token)[CLS]
+ Mean of the embeddingsThese can be set with
setPoolingStrategy
for the annotator.Additionally, it fixes a bug with serializing onnx models that do not have a
.onnx_data
file (b73dc0b1ecdb49af9f2fa6e47b0af23d47442a53). @prabod I think you worked on this part, could you review if the fix looks good? I provided a description in the commit message. Thanks!How Has This Been Tested?
New tests and old tests are passing.
Screenshots (if appropriate):
Types of changes
Checklist: