TryQuiet / quiet

A private, p2p alternative to Slack and Discord built on Tor & IPFS
https://www.tryquiet.org
GNU General Public License v3.0
1.98k stars 86 forks source link

Handling non-latin script in usernames and channel names #2426

Open kingalg opened 7 months ago

kingalg commented 7 months ago

This is the next step after implementing https://github.com/TryQuiet/quiet/issues/2299. It requires more sophisticated handling of the non-latin script in usernames and channel names.

As per the. comment that Holmes left in #2299:

I'm adding this note on how to handle names in non-latin script for users (and possibly channel) without allowing homograph attacks.

The idea is to have a list of tuples of "confusable" glyphs and let you use any of them, but block you from using one if another registered name is the same except for the confusable glyphs.

https://unicode.org/reports/tr46/#Registries

Libraries like this one may help but I'm not sure yet: https://github.com/oozcitak/uts46

Here was the response I received to my question:

look at IDNA2008 and UTS46. As a general rule, unicode is meant for display and not string comparison. Each protocol, that supports unicode tends to handle this differently... But usually they require conversion to "A-Labels" before comparing presented and reference identifiers. For example https://datatracker.ietf.org/doc/rfc9525/ & https://datatracker.ietf.org/doc/draft-ietf-lamps-rfc8398bis/. Part of the reason for UTS46 popularity is the common library support for it.