Closed nchepanov closed 11 years ago
one more eaxmple : 630
720
also: beni is not even being activated in 37
query. It is probably too short, but it's real.
We have penalty code commented out in the repo, and I think that prefering shorter results to longer ones given the same coverage is exactly what's needed in this issue.
@pltr, according to the issue -- results with the same cov must be sorted from short to long. Should I create issue about that ?
what is the status of this
well translator currently sorts beni results by (1 - cover, len(name)) on my side
well this doesn't solve the problem in some cases http://eu.barzer.net/~yanis/evtest/#9220 http://eu.barzer.net/~yanis/evtest/#karcher%20sc but in most cases it does
this problem has nothing to do with sorting by length. the penalty should be on INCOMPLETE WORDS not on length
I re-enabled side ngrams generation that was disabled looooooong ago, and...
Penalty on INCOPMLETE WORDS would kill perfectly legitimate query. Thanks to query compaction фен ровента from the first link turns into фен rowenta cf9220
, where 9220 is already part of the word and not a whole word and would be penalized.
I think it's a dilemma — we either normalize cf 9220
to cf9220
and lose whole words penalty or we don't normalize and thus can know whether words are truly complete.
Otherwise this fucking rowenta thing will get a 0.75 coverage instead of 1 for query 9220
.
Is this still an issue? Please triage.
talk to yanis for details
and much more short queries with cov=1 results