Add a `text` validation for multiline?

fcrozatier commented 1 year ago

The problem:

As per html spec, the new lines of a text area are normalized to \n in browser and \r\n when the form is sent. So a simple count in characters will yield inconsistent length between browser and server.

This inconsistency is inevitable (it's cross browser as part of the html spec) and only concerns new lines so the validation of textarea elements.

So one cannot validate a textarea with a maxlength in browser and the max validation in the schema (if there are new lines). This also impacts libraries trying to use a zod schema as single source of truth https://github.com/ciscoheat/sveltekit-superforms/issues/253

Workaround

Use a refine instead of a max like

const schema = z.object({
    name: z.string().default('Bob'),
    email: z.string().email().optional(),
    // bio: z.string().max(5) // this won't work with new lines
    bio: z.string().refine(
        (str) =>  str.length - (str.match(/\r\n/g) ?? []).length <= 5,
         { message: 'Bio too long' }
    ) // One has to take into account the \r characters added by the new line normalization, as per html specs
});

Alternatively use a transform:

const normalize = (text: string) => text.replaceAll('\r\n', '\n');

const schema = z.object({
    name: z.string().default('Bob'),
    email: z.string().email().optional(),
    bio: z.string().transform(normalize).pipe(z.string().max(5)) 
});

But both solutions add some boilerplate for every maxlength on a textarea.

Suggested solution:

Maybe a text validation could be useful. It would be like string but would implement max to prevent the browser/server inconsistency? Or a multiline option to adapt the string validation behavior?

Refs

For historical reasons, the element's value is normalized in three different ways for three different purposes. The raw value is the value as it was originally set. It is not normalized. The API value is the value used in the value IDL attribute, textLength IDL attribute, and by the maxlength and minlength content attributes. It is normalized so that line breaks use U+000A LINE FEED (LF) characters. Finally, there is the value, as used in form submission and other processing models in this specification. It is normalized as for the API value, and in addition, if necessary given the element's wrap attribute, additional line breaks are inserted to wrap the text at the given width.

https://html.spec.whatwg.org/multipage/form-elements.html#the-textarea-element

As with all MIME transmissions, "CR LF" (i.e., `%0D%0A') is used to separate lines of data.

https://www.w3.org/TR/html401/interact/forms.html#h-17.13.4

fritzmatias commented 11 months ago

I'm trying to do something similar, with a multiline regex but the second line is not validated at all.

//simple dd-MM-yyyy format
const regEx=new RegExp(/^[0-9]{2,2}\/[0-9]{2,2}\/[0-9]{4,4}$/gim)
const formSchema=z.object({
    values: z.string().regex(regEx, "Invalid format")
});

test data:

22/09/2023
11/11/2222 something else

mxdvl commented 11 months ago

@fritzmatias I think your example is actually quite different: your string will “match” the provided regular expression, but will not be split will all results. I’m not entirely sure what you want to achieve, but if you wanted to get an array of matches you would have to use transform like so:

const REGEX = new RegExp(/^[0-9]{2}\/[0-9]{2}\/[0-9]{4}$/);
const formSchema = z.object({
  values: z.string().transform(
    (lines) => lines.split("\n").filter((line) => line.match(REGEX)),
  ),
});

Note that this regular expression is very simplistic, and dates like 32/13/0000 will still “match”

fritzmatias commented 11 months ago

@mxdvl Thanks for your comment. What i'm trying to acchieve is validate the textarea has all lines matching the same regex. To acomplish that with a multiline RegEx + grouping, it should be good enougth since schema accepts string().regex(). (Or maybe i'm wrong). If i don't miss understand your example, you do the same by hand line by line. Executing transformation at validation time, and i expect to do it on action time based on the multiline RegEx validation. I'm newy with this library so sorry for any misussage.

What i found is, if i define the schema with as a z.string().regex(). I loose the first line of data at machAll call. But if i set it as z.string() it works properly. Could be a bug ?

Real RegEx with tags (check the global multiline): can be tested on https://regexr.com

const regEx=new RegExp(/^[ \t]*(?<date>[0-9]{2,2}\/[0-9]{2,2}\/[0-9]{4,4})[ \t]+(?<value>[0-9]+([.,][0-9]*)?)[ \t]*$/gm)
const formSchema=z.object({
    values: z.string().regex(regEx, "Invalid format") // validates first line only but  consumes first line of data at matchAll
   // values: z.string()     // does not validate but works fine at matchAll
});

const onSubmit = (schema: z.infer<typeof formSchema>) => {
        console.log(`Submited: ${JSON.stringify(schema.values,null,2)}`);
        const matches = schema.values.matchAll(regEx);
        const arr=Array.from(matches);
        console.log(`Array size: ${JSON.stringify(arr,null,2)}`);
        const convertedArray=arr.map(matchConverter);

 }

test data:

23/09/2023  130,6879    
23/09/2023  130,6879    

22/11/2232 12,1

22/09/2023  130,1797  
23/09/2023  130,6879
24/09/2023  131,1981
25/09/2023  131,7103

mxdvl commented 11 months ago

Your example is definitely vastly different to @fcrozatier and I don’t think this issue is the right place for your comments. I do not think there’s a bug in zod, but rather that you are using it incorrectly. Unfortunately, I do not have the resources to help you solve your current problem: it’s too specific to be adressed in a public issue.

fritzmatias commented 11 months ago

@mxdvl Thanks, just a last comment , I think your proposal misess the validation itself, since the use of filter is going to keep the good lines only, and not notify about the bad ones.

fritzmatias commented 11 months ago

@mxdvl Just to complete my scenario, and maybe this post helps someone else. I was able to work arround it using refine() for validation. ( this is a similar behaviour i expected for the z.string().regEx() call). And transform() to create the final object (not expected from z.string().regEx() call)

const regEx=new RegExp(/^[ \t]*([0-9]{2,2}\/[0-9]{2,2}\/[0-9]{4,4})[ \t]+([0-9]+([.,][0-9]*)?)[ \t]*$/gm)
const emptyLineRegEx = /^[ \r\n\t]*$/;
const formSchema=z.object({
    values: z.string().refine(
      lines => { const splittedLines = lines.split("\n");
              const matchedLines = splittedLines.filter(line=>line.match(regEx)).length;
              const notEmptyLines = splittedLines.filter(line=>!line.match(emptyLineRegEx)).length ;
              return matchedLines === notEmptyLines;
      }, 
      (lines)=>{
        const failedLines = lines.split("\n")
                            .map((line,index)=>{return {data:line,index}})
                            .filter((line)=> ! (line.data.match(regEx) || line.data.match(emptyLineRegEx)) )
                            .map((line) =>`(${line.index}) ${line.data}`);
        return {
          message: `Invalid lines: ${JSON.stringify(failedLines, null,2)}`
        } as CustomErrorParams;
    }).transform( lines =>
      Array.from(lines.matchAll(regEx)).map(matchConverter)
    )

    const onSubmit = (schema: z.infer<typeof formSchema>) => {
        // Gets an array of objects converted by my custom matchConverter() function
        console.log(`Submited: ${JSON.stringify(schema.values,null,2)}`);
    }

});

colinhacks / zod

Add a `text` validation for multiline? #2684