some unicode characters have been dropped

etianen / django-watson

Full-text multi-table search application for Django. Easy to install and use, with good performance.

BSD 3-Clause "New" or "Revised" License

1.2k stars 130 forks source link

some unicode characters have been dropped #176

Closed ratha-pkh closed 8 years ago

ratha-pkh commented 8 years ago

Hi, I got an issue with watson when trying to search unicode. eg.When I searched ផលិត, it turns out that watson removed to be ផលត. The problem is watson did removing character it thought non-word characters. Is there a workaround to fix this? Thanks.

RE_NON_WORD = re.compile(r"[^ \w\-\.']", re.UNICODE)

def escape_query(text):
    ...

    text = RE_NON_WORD.sub("", text)  # Remove non-word characters.
    return text

etianen commented 8 years ago

Try upgrading your django-watson. The latest version uses a much better regex.

https://github.com/etianen/django-watson/blob/master/src/watson/backends.py#L28