barzerman / barzer

barzer engine code
MIT License
2 stars 0 forks source link

weird chunks behavior #580

Closed bodritto closed 11 years ago

bodritto commented 11 years ago
  1. chunks swallow near punctuation
  2. words between chunks merged with chunks to one "hilite" tag:

http://eu.barzer.net/query/json?zurch=yes&key=aRLsIvszISAReCoS6ktgviZxN0YlRpbs6DKH7vro&query=%D0%B3%D0%B4%D0%B5%20%D0%BC%D0%BE%D0%B6%D0%BD%D0%BE%20%D1%81%D0%B4%D0%B5%D0%BB%D0%B0%D1%82%D1%8C%20%D0%B2%D1%8B%D0%BF%D0%B8%D1%81%D0%BA%D1%83%20%D1%81%D0%BE%20%D1%81%D1%87%D0%B5%D1%82%D0%B0

doc#1.28

0xd34df00d commented 11 years ago
  1. Chunks work that way — they are processed after punctuation is stripped away.
  2. Fluff stuff can be merged. As far as I can understand, "это" is fluff.
barzerman commented 11 years ago

i see broken utf8 chars in chunks. if you play enough with this you'll see the "question marks"

0xd34df00d commented 11 years ago

Could you provide some queries for which you do see it?

bodritto commented 11 years ago

@0xd34df00d

"не могу оплатить товар по карте"

http://eu.barzer.net/query/json?query=%D0%BD%D0%B5%20%D0%BC%D0%BE%D0%B3%D1%83%20%D0%BE%D0%BF%D0%BB%D0%B0%D1%82%D0%B8%D1%82%D1%8C%20%D1%82%D0%BE%D0%B2%D0%B0%D1%80%20%D0%BF%D0%BE%20%D0%BA%D0%B0%D1%80%D1%82%D0%B5&key=aRLsIvszISAReCoS6ktgviZxN0YlRpbs6DKH7vro&zurch=yes&flag=d

first doc has wrong first chunk highlighted

0xd34df00d commented 11 years ago

Thanks, will take a look as soon as I fall to some sane state with #576.