Handling non-latin script in usernames and channel names

This is the next step after implementing https://github.com/TryQuiet/quiet/issues/2299. It requires more sophisticated handling of the non-latin script in usernames and channel names.

As per the. comment that Holmes left in #2299:

I'm adding this note on how to handle names in non-latin script for users (and possibly channel) without allowing homograph attacks.

The idea is to have a list of tuples of "confusable" glyphs and let you use any of them, but block you from using one if another registered name is the same except for the confusable glyphs.

https://unicode.org/reports/tr46/#Registries

Libraries like this one may help but I'm not sure yet: https://github.com/oozcitak/uts46

Here was the response I received to my question:

look at IDNA2008 and UTS46. As a general rule, unicode is meant for display and not string comparison. Each protocol, that supports unicode tends to handle this differently... But usually they require conversion to "A-Labels" before comparing presented and reference identifiers. For example https://datatracker.ietf.org/doc/rfc9525/ & https://datatracker.ietf.org/doc/draft-ietf-lamps-rfc8398bis/. Part of the reason for UTS46 popularity is the common library support for it.

TryQuiet / quiet

Handling non-latin script in usernames and channel names #2426