fabian-hiller / valibot

The modular and type safe schema library for validating structural data 🤖
https://valibot.dev
MIT License
6.2k stars 198 forks source link

Add length validators based on the `Intl.Segmenter` API #784

Closed remonke closed 3 weeks ago

remonke commented 2 months ago

Actions minLength and maxLength use String.prototype.length, which is not very reliable as it relies on the number of character codes. This approach is not ideal for checking the number of characters the way humans perceive it, especially with emojis. For example, most emojis have a length of 2 (like 🙃), but some have a length of 7 (like 🧑🏻‍💻).

I suggest adding new actions like minGraphemeCount or maxGraphemeCount, which would use the Intl.Segmenter API instead of String.prototype.length. This would be particularly useful when dealing with user-generated content. As of April 2024, the API is supported in all major browsers.

The code would look something like this:


import * as v from 'valibot';

const PostSchema = v.object({
  title: v.pipe(v.string(), v.maxGraphemeCount(300, /* optional language parameter */ 'en')),
});
fabian-hiller commented 2 months ago

Thank you for your contribution. We have already discussed this in PR https://github.com/fabian-hiller/valibot/pull/666#issuecomment-2227393849. Feel free to create a PR and copy the source code of length, notLength, minLength and maxLength and implement graphemes, notGraphemes, minGraphemes and minGraphemes based on it.

2-NOW commented 1 month ago

Fixes: #853

fabian-hiller commented 3 weeks ago

This has been implemented and will be available in the next release.