I received a bug report today from a user whose German phrase including the word für matched the regex \br\b. The ü acted as a word boundary because I was not using the re.UNICODE flag.
A quick test showed that the same problem affects parsedatetime. For example, the made up phrase fünacht is parsed as a date based on the word nacht.
This has implications on any languages that use accents or a non-latin character set, especially if the locale includes any phrases that are common suffixes of other words.
I received a bug report today from a user whose German phrase including the word für matched the regex
\br\b
. The ü acted as a word boundary because I was not using there.UNICODE
flag.A quick test showed that the same problem affects parsedatetime. For example, the made up phrase fünacht is parsed as a date based on the word nacht.
This has implications on any languages that use accents or a non-latin character set, especially if the locale includes any phrases that are common suffixes of other words.