ekzhu / SetSimilaritySearch

All-pair set similarity search on millions of sets in Python and on a laptop
Apache License 2.0
588 stars 40 forks source link

Add new similarity function: containment_max #2

Closed ardate closed 5 years ago

ardate commented 5 years ago

Based on our communication via e-mail, I am adding a new similarity function "containment_max". This similarity function is similar to "containment", with the difference that the similarity value between two sets is normalised by the max. length of the two sets (instead of normalising it by the length of set1 as in "containment" function).

ekzhu commented 5 years ago

Thanks! Sorry for the delay. This seems to be better called "containment min", because it would be the minimum of two containment scores |x \cap y| / |x| and |x \cap y| / |y|. "containment max" would be dividing by min(|x|, |y|).

ardate commented 5 years ago

Hi Eric,

Thanks for your message! Indeed you are right.

Best, Arda

On 21 Mar 2019, at 20:33, Eric Zhu notifications@github.com wrote:

Thanks! Sorry for the delay. This seems to be better called "containment min", because it would be the minimum of two containment scores |x \cap y| / |x| and |x \cap y| / |y|. "containment max" would be dividing by min(|x|, |y|).

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ekzhu/SetSimilaritySearch/pull/2#issuecomment-475372254, or mute the thread https://github.com/notifications/unsubscribe-auth/AG50czk0-T4IJ8hM9PmtBr5ConP6Wcgeks5vY95ugaJpZM4bFM3r.