Closed bodritto closed 11 years ago
i'm looking at it as well .. this is bad . we should get these stemming issues once and for all
"дебетовых" - "дебетовая"
cannot reproduce in a small ruleset
the bug disappears when extrawords (spell/extra tag) are turned off . @0xd34df00d please investigate why does the extrawords thing fuck it up. it may be related to the way these words are loaded (not the same as tokens)
'мобильного банка' ---X--> 'мобильный банк'
Seems like stuff gets corrected to that stuff (initial guess from blindly looking at the code).
Could you please provide a minimal reproducing example?
it reproduces if the word in question is in the dictionary . @inggris has purged it on sveta but /home/yanis/public_html/rrbank_extrawords.txt heres the copy of the dictionary which caused it to crash
Well, after some playing around with rules file and code base I can't reproduce it anymore even from scratch.
do you have extrawords turned on?
On Wed, Jul 31, 2013 at 1:54 PM, Georg Rudoy notifications@github.comwrote:
Well, after some playing around with rules file and code base I can't reproduce it anymore even from scratch.
— Reply to this email directly or view it on GitHubhttps://github.com/barzerman/barzer/issues/602#issuecomment-21854128 .
www.barzer.net
Sure.
It'd be much easier if you just put the offending config somewhere.
take config from production and put the right file there .
On Wed, Jul 31, 2013 at 2:20 PM, Georg Rudoy notifications@github.comwrote:
Sure.
It'd be much easier if you just put the offending config somewhere.
— Reply to this email directly or view it on GitHubhttps://github.com/barzerman/barzer/issues/602#issuecomment-21855184 .
www.barzer.net
Taken config from production, replaced the rules with a single pattern <t>хуй</t><t>ипотека</t>
, хуй ипотеки
still matches. ипотека
is present in extrawords.
how about the whole 1000200 original set
On Wed, Jul 31, 2013 at 2:31 PM, Georg Rudoy notifications@github.comwrote:
Taken config from production, replaced the rules with a single pattern
ÈÕÊ ÉÐÏÔÅËÁ , ÈÕÊ ÉÐÏÔÅËÉ still matches. ÉÐÏÔÅËÁ is present in extrawords. ## Reply to this email directly or view it on GitHubhttps://github.com/barzerman/barzer/issues/602#issuecomment-21855620 .
www.barzer.net
That doesn't seem like a minimal reproducing example, and, moreover, I'm afraid that having different results now is a sign of a bigger hidden problem, from a local heisenbug to misunderstand of the bug description, thus IMO the best solution here is to sync our results on a smaller and saner dataset.
Though I'll use the whole dataset if sending me the (presumably) already existing data is that troublesome.
is this ready to be merged?
Yep, if you find the changes are OK.
есть паттерн "ипотека", но "ипотеки" почему-то не матчится аналогично "ипотечная программа" и "ипотечной программы"
user: sveta/rrbank