getkirby-v2 / toolkit

This is the deprecated toolkit for Kirby v2.
http://getkirby.com
81 stars 50 forks source link

Validator for word characters (including umlauts, accents) #195

Closed bezin closed 5 years ago

bezin commented 8 years ago

Hello everyone,

the toolkit's alpha validator seems rather strict, because it invalidates Umlauts. IMO, one would not expect 'Düsseldorf' or 'Köln' to be invalidated when checking for alpha characters. Obviously an issue you mainly run into when validating German and Finnish strings ;)

I suggest to implement a more "forgiving" regular expression.

Looking forward to your thoughts :)

Cheers

lukasbestle commented 8 years ago

Good that I'm not from either of those cities, otherwise I would tell you that just one can be valid. ;)

Back to the topic: While I agree that I'd consider umlauts "alphabetic characters", people with other languages could say the same for their special character, e.g. accents or completely different characters such as in Asian languages.

You can override the built-in behavior in your config.php or you can create your own validator with a different name. This is all explained in the docs.

bezin commented 8 years ago

Thanks for your feedback! I do agree that this should not be about neither your nor my sense regarding language.

For a multilingual CMS the best solution would be a validator which is as language agnostic as possible, but the current implementation is in favour of the English language. There is an universal approach that is aware of "accents or completely different characters such as in Asian languages" one might consider:

$isAlpha = preg_match('/^[\pL]+$/u', $value);

This regular expression matchtes every unicode character in the letter category, e.g. umlauts but also characters like áèąćęłńóź etc. See here for verbose explanation: http://www.regular-expressions.info/unicode. Of course not bullet-proof and one must check for side effects, but IMO a neat solution.

You may find an example implementation in a major CMS here (Contao) (Edit: fixed the example link)

Looking forward to your comments. Cheers

bastianallgeier commented 8 years ago

I actually think that both versions are useful. It often makes sense to have a validator for "strict" alphabetic characters a-z. But I also agree that we need a multi language version of this. I just wonder about the naming. As a developer I expect an alphabetic validator to only include a-z in the first place, but maybe I'm wrong. So my suggestion would be to find a good name for a multi language version and add that to the list of default validators.

bezin commented 8 years ago

I totally agree. After submitting my previous post, I also thought that it is still handy to have a strict a-z validator at your hand as well. Anyhow, I expect an alphabetic validator to include the characters mentioned above, eg. when validating names, streets, firms, book titles etc., but one might have different expectations regarding that for sure :)

An additional validator would be nice as well and may not cause any side effects in older plugins or sites. I'll think about a decent label ;)

bastianallgeier commented 8 years ago

Something along the line of "valid word characters". But it's actually quite hard to name it. I was also thinking about adding a second parameter to the validator, which makes it possible to switch between the two versions, but I'm not sure with both.