change default charset - Githubissues

OpenTSDB / opentsdb

A scalable, distributed Time Series Database.

http://opentsdb.net

GNU Lesser General Public License v2.1

4.99k stars 1.25k forks source link

change default charset #844

Open youzhagui2006 opened 8 years ago

youzhagui2006 commented 8 years ago

I'v found that we use ISO-8859-1 as default charset, this make unicode string parsed as ???, and then make UID assignment confusion: Assuming that we have two Chinese word as tagv: '中国', '美国', they are different word, but with length 2, so they parsed as ??, then they been treated as one word! When creating tagv, first one will success, others will fail with error like this: name=美国, already mapped to 中国

I think, we use uid every ware, so tagv only placed in tsdb-uid table(hope this is correct?), so make it utf8 won't cause performance issue.

gaort commented 7 years ago

I met the same issue. I tried to change all "ISO-8859-1" to "UTF-8" in code. then i found the uid cache use the charset to decode uid(byte[]) to String as hash key, utf-8 can not decode the uid correctly, caused lots of exceptions. then i changed the decode method from "fromBytes(byte[])" to "uidToString(byte[])". and now it works looks good. but i found when i use filter in query, the result is less than it should be. so it still has bugs somewhere.

hope this feature done soon, thanks

felixzw commented 7 years ago

hi @gaort , I hit the same problem, I tried to change "ISO-8859-1" to "UTF-8" as you mentioned and i get less result too when use filter in query, have you get any progress on this? thanks.

gaort commented 7 years ago

hi @felixzw , sorry for so long time to reply you. I gave up to continue changing the charset. so theres no more progress on this, sorry again.

long0419 commented 7 years ago

@gaort
why shouldn’t we try to use protobuf to encode the tagv value ? then query and view the result , decode the value to normal value ?!

ylin30 commented 7 years ago

@gaort I hit the same issue too. Do you have any progress on this?

gaort commented 6 years ago

@long0419 @ylin30 I've gave up to try this, and use a lib to change Chinese words to pinyin-spells instead.