Closed maziyarpanahi closed 2 months ago
thanks @SidWeng, we will look into this
@maziyarpanahi I found the root cause but I'm guessing it is not a bug, please take a look https://github.com/JohnSnowLabs/spark-nlp/discussions/14362#discussioncomment-10344195
Hi @SidWeng
Yes, that's exactly the root cause. We are working on adding a parameter to DocumentSimilarityRankerApproach
to choose the aggregation method when a document has multiple sentences. I hope we can include it in the next release.
Hi @SidWeng @danilojsl
I totally missed that you are using SentenceDetector
. The DocumentSimilarityRankerApproach
annotator is designed to only deal with the document level embeddings.
Until we implement a simple averaging to put everything together, here are a few options:
Discussed in https://github.com/JohnSnowLabs/spark-nlp/discussions/14362