Closed bodritto closed 10 years ago
What exactly is odd?
Радиоприемник SUPRA PAS-6277 красный/черный
query = "красный чайник"
cov = 0.715624
Around 11 of 14 ngrams match, giving the score pretty close to the observed.
Ный gram matches twice :(
why this one: "Радиоприемник HYUNDAI H-1549 черный/красный" doesn't have same coverage?
Good question. I'm already on it.
The following 14 features are generated (nevermind the numbers, they are feature IDs):
кра
:37 рас
:38 асн
:39 сны
:40 ный
:33 ый
:34 й ч
:59 ча
:60 чай
:61 айн
:62 йни
:63 ник
:11 кр
:36 ик
:12
Seventh feature, й ч
, is only generated for SUPRA shit, cause of красный/черный
→ красный черный
→ having the й ч
gram. черный/красный
doesn't generate it on the other hand.
In other words it's expected behavior.
this cant be the expected behavior красный чайник != красный/черный and the ngram overlap is nowhere near 71%
How did you compute it as nowhere near 71%?
Also there is a single ngram matching there, you can't avoid it without loosing / as a separator.
красный чайник != красный черный either you shouldnt be double counting "ный" obviously
It isn't double-counted.
Also, I dont argue they're equal.
красный чайник кра,рас,асн,ный,ый ,й ч,чай,айн,йни,ник,ик ; - 11 3grams черный/красный чер,ерн,рны,ный,ый ,й к,кра,рас,асн,ный,ый ; 11 3grams overlapping 5: кра,рас,асн,ный,ый
so 5 ngrams ouе of 11 overlap and the score is 71% - how is that expected behavior? the expectations must be wrong
Please reread the source result. Also, you've missed a lot of ngrams. 19.11.2013 23:15 пользователь "barzerman" notifications@github.com написал:
красный чайник кра,рас,асн,ный,ый ,й ч,чай,айн,йни,ник,ик ; - 11 3grams черный/красный чер,ерн,рны,ный,ый ,й к,кра,рас,асн,ный,ый ; 11 3grams overlapping 5: кра,рас,асн,ный,ый
so 5 ngrams ouе of 11 overlap and the score is 71% - how is that expected behavior? the expectations must be wrong
— Reply to this email directly or view it on GitHubhttps://github.com/barzerman/barzer/issues/647#issuecomment-28822744 .
we have an obviously shitty result in this specific case. even if it checks with the current algorithm it only means that the current algorithm is shitty in this specific case and needs to be changed so that this shitty case isn't shitty anymore. it is a real problem which needs to be solved
Then we'll probably break a lot of other cases, and we don't have a proper way of testing this (and we can hardly have one, since it's a purely expert-based thing).
I'm strongly against fitting our alrorithm to such corner cases.
@pltr please look ASAP at the translator
Closing in favor of #648.
по "красный чайник" не находится Чайник Lamark LK-1006 красный но находится Радиоприемник SUPRA PAS-6277 красный/черный
красный чайник -> радиоприемник http://eu.barzer.net/bjson?&key=BHjFDiC0QdoyDF7DBVn1rLWu0LaKRi8QeKiVSSSW&query=%D0%BA%D1%80%D0%B0%D1%81%D0%BD%D1%8B%D0%B9%20%D1%87%D0%B0%D0%B9%D0%BD%D0%B8%D0%BA
красТный чайник -> красные чайники http://eu.barzer.net/translate?&key=BHjFDiC0QdoyDF7DBVn1rLWu0LaKRi8QeKiVSSSW&query=%D0%BA%D1%80%D0%B0%D1%81%D1%82%D0%BD%D1%8B%D0%B9%20%D1%87%D0%B0%D0%B9%D0%BD%D0%B8%D0%BA
your list of ngrams has an obvious problem: for красный чайник you have кра рас ... ча чай you treat space different from start of phrase @0xd34df00d
After talking with @inggris I finally got the issue. I agree some shit is going on.
Проблемная ситуация
по запросу красный чайник
НЕ НАХОДИТСЯ id 44404
name Чайник Lamark LK-1006 красный
но находится id 63595
name Радиоприемник SUPRA PAS-6277 красный/черный
а) нужно ответить на два вопроса: 1) какой coverage у "Чайник Lamark LK-1006 красный" 2) почему он вообще не попадает в выдачу и почему он так сильно меньше чем у "Радиоприемник SUPRA PAS-6277 красный/черный" б) починить
fix deployed to venik and eu machines. fixed
http://eu.barzer.net/translate?&key=BHjFDiC0QdoyDF7DBVn1rLWu0LaKRi8QeKiVSSSW&query=%D0%BA%D1%80%D0%B0%D1%81%D0%BD%D1%8B%D0%B9%20%D1%87%D0%B0%D0%B9%D0%BD%D0%B8%D0%BA
data: eu.barzer.net, user 1000106 (ev_all)