Keats / validator

Simple validation for Rust structs
MIT License
1.97k stars 141 forks source link

add length_utf16 validator #245

Open DXist opened 1 year ago

DXist commented 1 year ago

This PR adds length_utf16 validator.

My project exposes data from Salesforce via JsonSchema based API. I want to validate field lengths in the same way as Salesforce does - by counting UTF16 characters.

UTF16 is used for Unicode string representation in JavaScript, Java and Salesforce APEX. I think this validator could be useful to others as well. A good use case is to align backend and frontend length validators.

An example of mismatch between UTF16 and Unicode codepoints: '𝔠' symbol has 2 UTF16 characters but it's still 1 Unicode codepoint.

Should I wrap the implementation in optional feature length_utf16 ?

Keats commented 1 year ago

I don't think it makes sense to add that to the library, it's better added as a custom validator.

LeoniePhiline commented 1 year ago

@Keats The need for an UTF-16 code unit length validator is very common - assume all of web form handling -, since the maxlength of HTML form fields counts UTF-16 code units.

If the frontend counts UTF-16 code units, and the backend counts UTF-8 code units, then inconsistencies arise whenever values contain characters encoded with different length in UTF-16 vs UTF-8.

This results in values being rejected by the server which passed client side validation, whenever the server's UTF-8 representation longer than the browser's UTF-16 representation.