documentcloud / visualsearch

A Rich Search Box for Real Data
http://documentcloud.github.io/visualsearch/
MIT License
1.8k stars 225 forks source link

Unicode characters issue #67

Closed kaginyana closed 11 years ago

kaginyana commented 12 years ago

In example provided: when I try to customize facet label and put a string in Armenian it doesn’t work e.g. when I type in search box first latter 'տ' it doesn't show me facet. I also tried Russian characters with same result.

Code example:
...
facetMatches : function(callback) {
    callback([
        'account', 'filter', 'access', 'тип','տեսակ',
        { label: 'city',    category: 'location' },
...

But when I try to inspect code like this:

visualSearch.searchBox.value('տեսակ:type')

it gives me right result, e.g. showing the facet.

Tested in latest Chrome, FF.

yassersouri commented 12 years ago

+1

we need this!

how can I fix this?

samuelclay commented 11 years ago

The query parser uses a simple regex to figure out words. As long as \w+ matches an entire word in Russian/Cyrillic (and there's no reason it shouldn't) VisualSearch.js should have any problems with Unicode/non-ascii characters.

mkavakebi commented 11 years ago

there is the same bug for unicode regex match. autocompleteValues in search_facet.js doesn't match but autocompleteValues in search_input.js matches i tried it using persian lists. the difference is between this two lines of code: A)var matcher = new RegExp('^' + re, 'i'); B)var matcher = new RegExp('\b' + re, 'i');

first one works fine but not the second one for my persian values. you can just replace (B) in search_facet.js with (A) in search_input.js

yassersouri commented 11 years ago

@mkavakebi I tried what you said, but that did not work. Can you share your working version with persian list?