JohnSnowLabs / spark-nlp

State of the Art Natural Language Processing
https://sparknlp.org/
Apache License 2.0
3.88k stars 711 forks source link

[SPARKNLP-1059] Adding aggressiveMatching parameter to DocumentSimilarityRanker #14370

Closed danilojsl closed 2 months ago

danilojsl commented 3 months ago

Description

This pull request introduces the aggregationMethod parameter to the DocumentSimilarityRanker annotator. The new parameter allows users to specify the method used to aggregate multiple sentence embeddings into a single vector representation.

Motivation and Context

Allows users to tailor the aggregation method to their specific use case, whether they need a general overview (AVERAGE), focus on the initial context (FIRST), or emphasize the strongest signals (MAX).

This change solves the following issues:

How Has This Been Tested?

Screenshots (if appropriate):

Types of changes

Checklist: