BuckarooBanzay / beerchat_proxy

IRC/Discord proxy for the beerchat mod
1 stars 0 forks source link

Transliteration #87

Open Klaranth opened 3 years ago

Klaranth commented 3 years ago

@S-S-X For pandorabot: https://discord.com/channels/513329453741637637/668090931861258250/729456054001467443

@BuckarooBanzay I tried the nodejs variant (https://www.npmjs.com/package/transliteration) but it does not work with some utf8 characters console.log(transliteration.transliterate('π’Ÿπ’Άπ“ƒπ’Ήπ‘’')); just prints out "" .

@S-S-X Well, I tried that lua transliterator oob: print(t:transliteration_get('π’Ÿπ’Άπ“ƒπ’Ήπ‘’')) and that printed me??me??me??me??me?? to console.

Lists seems to be specialized and pretty short for most transliteration projects around web but probably someone has combined that stuff already... maybe.. just cant find it.. or it does not exist.

Seems like at least google knows how to transliterate those.

Somewhat related : minetest-beerchat/beerchat#38 minetest-beerchat/beerchat_proxy#39 minetest-beerchat/beerchat_proxy#16

BuckarooBanzay commented 3 years ago

This looks promising too: https://github.com/kshetline/unidecode-plus

> unidecode('CafΓ© εŒ—δΊ¬, πŸ˜€πŸ˜πŸ˜‡πŸ˜ˆπŸ˜±', { smartSpacing: true })
'Cafe Bei Jing, :-) :-D O:-) >:-) =:-O'
S-S-X commented 3 years ago

Looked around a bit and that one indeed seems like attempt to answer my earlier question

someone has combined that stuff already... maybe.. just cant find it.. or it does not exist

It is nowhere near complete and there will be a lot of missing stuff but it seems to be about transliteration that is also best fitting for our use case, from readme.md of unicode-plus:

Some of the transliterations go for matching the shape of characters rather than their pronunciation

Also MIT license for code but different license for data, I'm not familiar with Perl license:

Note that all the files named 'x??.js' in data are originally derived directly from equivalent Perl files, distributed under the Perl license, not the BSD or MIT licenses.