clkao / plv8x

Helpers for managing plv8 javascript modules
Other
68 stars 23 forks source link

problem with long UTF8 char (such as 𨑨 U+28468 / UTF8:F0 A8 91 A8) #17

Open a-tsioh opened 9 years ago

a-tsioh commented 9 years ago

No idea of what's going wrong. on a db named "twblg" encoded in UTF8. data from moedict-data-twblg imported with

$ xzcat dump.sql.xz | psql --db twblg

I got the following behaviour with some long UTF8 char

This works:

$ plv8x -d twblg -c "SELECT * FROM entries WHERE 詞目 = '𨑨' ;"
[ { '主編號': '23001',
    '屬性代號': '2',
    '詞目': '𨑨',
    '音讀': 'tshit',
    '文白': '替',
    '部首': '辵',
    '部首序': '162-04-08',
    '方言差對應': '' } ]

This does not:

$ plv8x -d twblg -E "plv8.execute 'SELECT * FROM entries WHERE 詞目=\'𨑨\''"
[]

but this does works with ‘好‘:

$ plv8x -d twblg -E "plv8.execute 'SELECT * FROM entries WHERE 詞目=\'好\''"
[ { '主編號': '2282',
    '屬性代號': '1',
    '詞目': '好',
    '音讀': 'hó',
    '文白': '白',
    '部首': '女',
    '部首序': '038-03-06',
    '方言差對應': '[方]043' },
  { '主編號': '2283',
    '屬性代號': '1',
    '詞目': '好',
    '音讀': 'hònn',
    '文白': '文',
    '部首': '女',
    '部首序': '038-03-06',
    '方言差對應': '' } ]

(SHOW SERVER_ENCOGING returns UTF8) (Debian wheezy)