gcanti / tcomb-validation

Validation library based on type combinators
MIT License
400 stars 23 forks source link

Customize messages on a per-Type basis #5

Closed grncdr closed 9 years ago

grncdr commented 10 years ago

It's currently possible to customize the messaging for a particular property:

var Identifier = subtype(Str, function (v) {
  return /^[a-z._-]{1,32}$/.test(v);
}, 'Identifier');

var Thing = struct({
  id: Identifier
});

validate(Thing, {id: 'lol invalid'}, {
  messages: {
    id: ':path invalid. :actual must be a string containing only alphanumeric characters, hyphens, underscores and dots, and be between 1 and 32 characters long.'
  }
});

But this gets awkward when you re-use the same type in multiple places:

var RelatedThing = Thing.extend({ parentId: Identifer });

Instead of repeating the same message (or putting it into a variable and repeating that), It would be much DRYer to define messages for types/predicates instead of fields. This would also give the advantage that one can give more specific error messages:

validate(RelatedThing, {id: 'totally_valid', parentId: 'not so valid'}, {
  typeMessages: {
    Str: ':actual at :path must be a string.',
    Identifier: ':actual at :path must contain 1-32 alphanumeric characters, hyphens, underscores or dots.'
  }
});

What do you think?

gcanti commented 10 years ago

Good idea. However default messages should be attached to the type: type names are optional, so

typeMessages: {
  Str: ':actual at :path must be a string.',
  Identifier: ':actual at :path must contain 1-32 alphanumeric characters, hyphens, underscores or dots.'
}

is not reliable. Something like

Identifier.meta.validation = {message: '...'};

Then you can override the default message on a field basis as usual.

grncdr commented 10 years ago

I thought about that, but passing in the custom messages at validation time (instead of type definition time) is valuable because it allows you to localize them. Might be worth supporting both? (e.g. default message attached to the type, but allow per-type overrides to be passed in to validate)

grncdr commented 10 years ago

I also think it's a reasonable trade-off to say that "if you want overridable error messages for a type, it must be named".

gcanti commented 10 years ago

Localization...I knew, sooner or later the issue would be raised :)

Well, I think the problem should be solved outside the library. A general solution would be to exploit the meta info of the struct:

var Person = struct({
  name: Str,
  age: Num
});

// bundles are hard coded, ajax requested, etc
var bundles = {
  'en_US': {
    Str: ':actual at :path must be a string.',
    Identifier: ':actual at :path must contain 1-32 alphanumeric characters, hyphens, underscores or dots.'
  },
  'it_IT': {
    ...
  }
};

function getMessages(type, locale, overrides) {
  // here you can write the best logic for your app
  var messages = {};
  var bundle = bundles[locale];
  var props = type.meta.props;
  for (var field in props) {
    var type = props[field];
    var name = t.util.getName(type);
    messages[field] = overrides[field] || bundle[name];
  }
  return messages;
}

validate(value, Person, {messages: getMessages(Person, 'en_US')});
validate(value, Person, {messages: getMessages(Person, 'it_IT')});

What do you think?

grncdr commented 10 years ago

Well, I think this issue [error localization] should be solved outside the library.

Absolutely, I'm not proposing a system for localizing error messages, just overriding them. That one enables the other doesn't mean it's the only use case. For example, it's also useful to override error messages for named types simply so that messages are consistent throughout all validations within a single locale. While one could do this by modifying all the built-in types:

var t = require('tcomb');
t.Str.meta.validation.message = "my custom message"

... to me that feels a lot messier than simply passing in a data-structure that the library will prefer over it's built-in messages.

Regarding the solution proposed above, the algorithm needs to be made recursive (so that the types of nested records/lists/etc are inspected for custom error messages), and it's building up a lot of garbage for each validate call. (though a more "real world" solution could of course cache these things).

I can understand the desire to keep this module small, so maybe I will write a separate module that generates & caches the custom messages for validate, but IMO it's simpler to support this functionality once here.

gcanti commented 10 years ago

Absolutely, I'm not proposing a system for localizing error messages

I do. It would be great if we can find a general solution.

it's also useful to override error messages for named types simply so that messages are consistent throughout all validations within a single locale

I agree.

While one could do this by modifying all the built-in types

Honestly at the moment I'm not sure t.Str.meta.validation.message = "my custom message" would be a decent solution. However, as you stated, it's more correct to attach a validation message to a type rather than a field, since the validation fails because of the type not of the field. Types carry informations and meanings. Can you show me an example where two fields with the same type need different error messages? Just to understand your real world use case.

to me that feels a lot messier than simply passing in a data-structure that the library will prefer over it's built-in messages

Ok, but in that case if you use the same type in different validations you must still pass around that data structure.

Regarding the solution proposed above, the algorithm needs to be made recursive

You are absolutely right. This needs to be thought through more.

I can understand the desire to keep this module small, so maybe I will write a separate module that generates & caches the custom messages for validate

That would be interesting. Anyway let's recap the requisites:

You propose an additional option typeMessages:

validate(value, type, {
  typeMessages: {
    Str: ':actual at :path must be a string.',
    Identifier: ':actual at :path must contain 1-32 alphanumeric characters, hyphens, underscores or dots.'
  }
});

that solves (2) while (3) is already solved by the messages option, right?

grncdr commented 10 years ago

that solves (2) while (3) is already solved by messages, right?

Yes, and the ability to pass in messages and typeMessages at validation-time also solves (1) as far as I'm concerned. There's no need to do anything more specifically aimed at localization.

If we had typeMessages, you wouldn't need the getMessages function your example above, it just becomes a property access:

var bundles = {
  'en_US': {
    Str: ':actual at :path must be a string.',
    Identifier: ':actual at :path must contain 1-32 alphanumeric characters, hyphens, underscores or dots.'
  },
  'it_IT': {
    ...
  }
};

validate(value, type, { typeMessages: bundles[locale] })
grncdr commented 10 years ago

Can you show me an example where two fields with the same type need different error messages?

Outside of localization, I don't have one. I'd be perfectly happy to have the same error message for the same type in all cases that I can currently think of.

Edit: same error message for the same type in the same call to validation.

gcanti commented 10 years ago

I see. Thus an option is having an hash containing all the types you use across the app and using it in all the validations without regard to the actual type being validated.

Pros:

Cons:

A question: :actual should be localized or it isn't worth it?

p.s. I know I'm a little bit pedantic, but I want to be sure

grncdr commented 10 years ago

A question: :actual should be localized or it isn't worth it?

If you mean the placeholder string ":actual" then I can't see how it would be worth it. I'm not sure how you'd localize the value itself, given that it probably comes from a user. :smile:

Also, I haven't been using ":expected" in my example messages because that way I don't have to worry about whether Num is universally meaningful. Regarding the pro's and con's I agree 100% with what you have there. I wouldn't make the name required (yet...) because not every user of tcomb is going to use tcomb-validation and even those that do might not care so much about the error messages being good (their loss).

I know I'm a little bit pedantic, but I want to be sure

I greatly appreciate pedantry when it comes to type-checking and validation! :+1:

gcanti commented 10 years ago

Great! I'm upgrading the library to tcomb v0.3.1, I'll add this stuff asap.

gcanti commented 10 years ago

Let's take it a step further.

If you mean the placeholder string ":actual" then I can't see how it would be worth it. I'm not sure how you'd localize the value itself, given that it probably comes from a user.

This is the problem. Say you have this type:

var Person = struct({
  name: Str,
  birthDate: Dat
});

and a form returning this value:

var value = {
  name: 'giulio',
  birthDate: '30/11/1973' // italian localization dd/mm/yyyy
};

Now take these APIs:

(1) parse(value, type, options) -> Obj where value comes from a form (an hash Str -> Str) and Obj contains the parsed values. Example:

parse({name: 'giulio', birthDate: '30/11/1973'}) -> {name: 'giulio', birthDate: new Date(1973, 10, 30)} // parse ok
parse({name: 'giulio', birthDate: '30/11/'}) -> {name: 'giulio', birthDate: '30/11/'} // parse KO

(parse API doesn't exist at the moment)

(2) validate(value, type, options) (tcomb-validation) (3) createForm(type, options) -> ..form.. -> value (tcomb-form, internally uses tcomb-validation)

The 3-uple (value, type, options) is present in slightly different forms in all the APIs. It would be nice to unify the treatment of options. What if all the APIs had a types option?

var bundles = {
  'it_IT': {
    Dat: {
      i17n: {
        // the function used to parse the strings coming from a form (optional)
        parse: function (x) { ... },
        // the function used to display a value in a form (optional)
        format: function () { ... }
      },
      // the message to display when a validation fails, :actual is formatted with `format` (if specified)
      ':type': ':actual deve essere una data valida nel formato dd/mm/yyyy'
    },
    ...
  },
  ...
};

parse(value, Person, {types: bundles.it_IT});

validate(value, Person, {types: bundles.it_IT});

createForm(Person, {types: bundles.it_IT});
gcanti commented 10 years ago

@grncdr I decided to apply the KISS principle and implement your proposal

grncdr commented 10 years ago

Ah sorry, for not responding, I actually have a half-finished response in another tab!

grncdr commented 10 years ago

Here's that half-finished response:

I like this proposal, in particular being able to define the per-type parse/format/message properties seems very desirable.

Where I'm not so sure is the separation of parse, validate, and the type constructor itself. What about giving parse the signature (value, Type, options) -> Either[Type, ValidationError]. That is, parse would return you a fully constructed Type instance, or the error result from validate.

Part of the reason for proposing this is that you could accomplish both jobs with a depth-first traversal of the type metadata. A (half-baked) algorithm:

  1. if input is a scalar value or Type is irreducible, call validate(input, Type, options),
    1. if the input is valid return Ok(Type(input)).
    2. else return the validation result
  2. else if type is a composite, descend into it's sub-components (and likewise for the input). As an expression this looks like var properties = map(Type.meta, function (value, key) { return parse(input[key], Type.meta.props[key], options)
    1. if there were no errors, return Type(properties) a. combine errors from any failed validation results in properties into a single result and return it

In addition to a simpler API, this approach should be more performant, since constructing the composite types will only perform a shallow traversal (since each instanceof check will pass, the code would never descend again into a composite type).

What do you think? Too complex?

gcanti commented 10 years ago

So we can merge parse and validate:

validation(value: Any, T: Type, [options: Obj]) -> Either(T, ValidationError)

and express the parsing and messages configuration in options.

It's something I had in my mind after this:

https://github.com/gcanti/tcomb/issues/47#issuecomment-57654777

// options can be something like this:
options: {
  types: {
    Str: { // irriducible, only message
      message: 'insert a string'
    },
    Dat: { // irriducible, message AND i17n
      i17n: {
        parse: function (x) {
          return ...
        },
        format: function (date) {
          return ...
        }
      },
      message: 'insert a date with format ...'
    },
    Tags: { // not irriducible, it makes sense only to define a message
      message: '...'
    }
    ...
  },
  messages: {...}
}

// where

var Tags = list(Str, 'Tags');

EDIT: Maybe it makes sense to express a parse function also for Tags: ex.

var tags = 'tag1,tag2';
function parse(x) {
  return x.split(',');
}
grncdr commented 10 years ago

:+1:

gcanti commented 10 years ago

I could modify Result to be backward compatible:

var Result = struct({
  errors: maybe(list(Err)),
  value: Any // <-- new field containing the instance if validation succeeded
}, 'Result');
grncdr commented 10 years ago

Yes that would make sense to me, no need for a wrapper around a wrapper, and it's a nice type synonym for Either with a more meaningful name (in the context).

gcanti commented 10 years ago

Hi, I wrote a wiki page to state the various problems I'm facing. Writing down things helps me to reason about the refactoring.

https://github.com/gcanti/tcomb-validation/wiki

grncdr commented 10 years ago

I'm not sure I understand the motivation for trying to generalize transform? I mean, it would probably be useful, but if it's necessary I didn't understand it.

Another small note on the signature of transform:

var value = transform(json, [
  {from: Str, to: Dat: parse: ..., format: ...},
  {from: AddressJSON, to: Address: parse: ..., format: ...},
]);

In this case, would it try to transform every Str to a Dat? It seems ambiguous. Unless transform is necessary to implement the other features, I would put it low on the priority list.

Anyways, I'm more interested in the "Validation" section... :smile:

It seems like you're planning to make the message formatting a separate function from validation, which makes a lot of sense to me!

Considering these signatures:

validate(data, T, options) -> Result(T, list(Error))
getMessage(list(Error), ?) -> ?

It seems like options are only used for formatting messages, so maybe:

validate(data, T) -> Result(T, list(Error))
getMessage(list(Error), options) -> ?

Keeping the error results minimal also seems like a good idea. I'll see how far I can go with writing a formatter using just type, actual, and path properties, it seems like types may want to include more information than just that, but I can't think of a specific case where type metadata wouldn't be sufficent.

P.S. - Is continuing to give feedback on this issue OK?

grncdr commented 10 years ago

Ah, isn't that always how it goes, I realize the answer to my question right after hitting "submit" :smile:

I guess the options for validate would include the parse for a given type? and this is where transform starts to seem more appealing...

grncdr commented 10 years ago

A little prototype formatter for error messages is here: https://gist.github.com/grncdr/cd3ebfe4d709481357ff

gcanti commented 10 years ago

I'm not sure I understand the motivation for trying to generalize transform

I'm trying to modularize everything, splitting the logic of all the libraries in basic operations:

Thus transformations don't touch tcomb-validation at the moment.

In this case, would it try to transform every Str to a Dat?

It's just theory at the moment, but the idea would be to check the pair (Str, Dat) (something like Scala's implicits), that is apply the transformation only if from = Str and to = Dat.

It seems like you're planning to make the message formatting a separate function from validation, which makes a lot of sense to me!

Yes! I refactored the whole codebase (2.0 branch). I'm really satisfied:

It seems like options are only used for formatting messages, so maybe:

Yeah. Yesterday I decided to keep completely separated the validation logic and the messages handling and suddenly everything made more sense. No more half backed DSL to handle messages. This is a more powerful combination, as you noticed:

validate(data, T) -> Result(T, list(Error))
getMessage(list(Error), options) -> ?

and it follows the Node.js best practice: "Write small modules that each do one thing"

P.S. - Is continuing to give feedback on this issue OK?

Absolutely! I love to work together.

A little prototype formatter for error messages is here:

:+1: