br1ghtyang / asterixdb

Automatically exported from code.google.com/p/asterixdb
0 stars 0 forks source link

Jaccard similarity fails to return correct answers for some queries #628

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
Run the following query:

let $v1 := "Query Processing in Multidatabase Systems."
let $v2 := "Query Processing in Object-Oriented Database Systems."
let $sim := similarity-jaccard-check(word-tokens($v1), word-tokens($v2),0.5f)
return {"check": $sim[0], "similarity": $sim[1]}

The expected output is:
{ "check": true, "similarity": 0.5f }

It returns:
{ "check": false, "similarity": 0.0f }

Original issue reported on code.google.com by icetin...@gmail.com on 13 Sep 2013 at 6:02

GoogleCodeExporter commented 8 years ago
The problem was in the computation of minimum union size and maximum 
intersection size which
are needed to decide early termination. Therefore it usually happens
when the similarity is close to the threshold.

Original comment by icetin...@gmail.com on 16 Sep 2013 at 11:00