deltachat / message-parser

Parsing of Links, Email adresses, simple text formatting (markdown subset), user mentions, hashtags and more in DeltaChat messages.
https://deltachat.github.io/message-parser/
Other
12 stars 2 forks source link

Smarter punycode warnings #61

Open Simon-Laux opened 5 months ago

Simon-Laux commented 5 months ago

Currently all links where the hostname/domain contains puny code triggers the warning/confirmation dialog.

The Problem

While this is good for English region it is bad for other regions that use a different font/script/alphabet. For them there are many false positives with perfectly valid normal urls.

Non-exhaustive list of Examples:

I don't know how big the problem really is, as internationalised urls are still relatively new and before you could only use ascii, many websites and companies still stick to ascii domains.

Update: https://en.wikipedia.org/wiki/.рф - is used much apparently

Proposed solution

For each language we support specify a list of allowed unicode ranges.

for each detected puny code link check if it fits into the allowed ranges for any language, if no warn the user.

for example:

Alternatives Considered

Testcases

https://www.münchen.de

To Do: collect more, while checking the meaning, not that we add some problematic domains because we forgot the check

Anyways the first step is to collect test cases.

farooqkz commented 3 months ago

In Iranian society it's not important for hostnames but for the path part.