Add support for Validation

raderio commented 5 years ago

Usually on deserialization we want to collect all validation errors and return them in response to the client. In order to catch all error the validation should be done before object creation.

If it will be done after object creation, in init block, than validation will be separated in 2 phases, first one will be validation regarding non-nullable field, because we cannot create objects if they have non-nullable properties. The second one will be to do structural validation: min/max values, length, regexp pattern, etc.

Example class Data(val name: String, val age: Int, val email: String) when we send a json {"email": "xyz@xyz.com", "name": "A"} we should return a response where to indicate that age can not be null and name size must be greater than 2, at once to send all the errors.

Also, structural validation will be great to do based on annotations and not in init block, something like JSR 303. This will be useful for documentation generation, it is far easier to generate based on annotations.

How I see the flow:

Read annotations and generate validation rules. Can be done by 3rd party library. As result Map<Class, List>
Parse json as a map
Apply validation rules. Validation rules can be passed as parameter in configuration.
Create object

fluidsonic commented 5 years ago

Thank you @raderio for opening this discussion.

Here some questions and thoughts regarding your suggestion:

What is the intended use case and benefit of having all parsing and validation issues returned at once rather than just the first one? In the majority of scenarios clients send valid JSON or are just slightly off. It is mostly during initial development of a client or because of larger API changes that a JSON structure has multiple issues. Since that is just a small minority of cases a lot of architectural overhead would have to be added in order to implement such functionality which is likely used rarely.
Structural validation vs. domain validation It makes sense the check for the presence and correct type of information before object creation. That is the current approach but currently needs to be done manually through JSONCodecs. With annotation processing (#16) this would become much easier, as you've also suggested, but it's still work-in-progress and has a long way to go. Annotations would then be used to test for the correct structure and types of all information during parsing before object creation and can help with useful context-aware error messages. The result are type-safe values based on Kotlin and custom types.

Domain validation like min/max, length, regex, etc. on the other hand is not just about the structure but the domain-specific traits every single property has. JSON parsing in my opinion shouldn't be about any domain-specific validation but only help transporting a data model using a defined type-safe structure from one location to another, i.e. as close to POJOs as possible. The model can then be used by domain-specific code to apply whatever rules it may deem important depending on the context the model is being used. This provides a better abstraction between data model, data (de)serialization and data validation rather than mixing up the latter two.
Domain validation is complex Take Twitter for example. There are complex rules involved to calculate how long a tweet actually is, including information about URL shortening and photo uploads. Calculating that length within the model or the deserialization layer for validating against the maximum length would put a lot of domain logic with a lot of dependencies in these layers. Also there may be older tweets in the database which have followed different validation rules and would no longer be valid today. They may still be valid in the model but invalid only when creating new tweets, hence it depends on the context.

Should a developer still want to perform such validation then it can easily be done directly in the constructors or initializers of model classes and completely independent of the deserialization. This allows the full use of Kotlin's extensive standard library as well as third-party (validation) libraries. Replicating validation with some special annotations will likely just reinvent part of the wheel. The overhead of implementing and maintaining such a validation system will likely outweigh the benefits by far.
Domain validation can be done in isolation Deriving documentation from validation annotations makes sense in some simpler scenarios. Such annotation-based validation could still be implemented and applied without having to implement it directly in the JSON parser - actually completely independent of the (de)serialization library being used. Such annotations could be used by a different library to generate initializers which perform the validation and throw an appropriate exception. The (de)serialization library would then fail to create such an invalid object as intended.
Are there any good examples of that functionality in libraries or languages which is widely used?

raderio commented 5 years ago

In the majority of scenarios clients send valid JSON

If you have only one version of API and only one client, than yes, but we have 3 clients(we, android, ios) and should support several version of each.

Take Twitter for example. There are complex rules involved to calculate how long a tweet actually is

This is more like an exception for rule, usually you have an input with max length

fluidsonic commented 5 years ago

In the majority of scenarios clients send valid JSON

If you have only one version of API and only one client, than yes, but we have 3 clients(we, android, ios) and should support several version of each.

How can validation which goes beyond structure help you in this situation? From the client's perspective there will be useful error messages in any case. Or is it about the documentation? I'm also used to having multiple clients each having widely different versions as typically happens as you release more and more app updates. I've stopped versioning APIs long time ago and instead implement capabilities. Each client tells in the request what API capabilities they support and the API server will adjust their functionality and response format accordingly to consider and/or compensate for missing capabilities.

Take Twitter for example. There are complex rules involved to calculate how long a tweet actually is

This is more like an exception for rule, usually you have an input with max length

It's a more extreme one, yes, but there are many more. In some cases the values may depend on the client, on the database or other external or complex information. My suggestion is instead of having part of the validation in the parsing or data layer and part of it in the business logic layer have all at one place so there is no guessing at runtime what data has been validated and what not. Since not all business logic can be at the data/parsing layer (esp. not in micro-architecture scenarios) it makes more sense to have as much as possible close to the business logic. This also reduces the potential for error because data will be validated very close to their actual use rather than once and then traveling through the system until the consumer can no longer be certain that it was validated as expected.

raderio commented 5 years ago

Each client tells in the request what API capabilities they support and the API server will adjust their functionality and response format accordingly to consider and/or compensate for missing capabilities.

Is this technique has a name, or maybe you can provide links to some articles?

it makes more sense to have as much as possible close to the business logic

yes, but it this case the validation is not so declarative, also if it is not done by annotations it is harder to do the documentation, because in case of annotations you can generate it

fluidsonic commented 5 years ago

Each client tells in the request what API capabilities they support and the API server will adjust their functionality and response format accordingly to consider and/or compensate for missing capabilities.

Is this technique has a name, or maybe you can provide links to some articles?

I've done that without checking if anyone else is doing that already, so if there is a name then I don't know it, nor do I know any articles. It was working very good though so maybe I should write an article about it :)

it makes more sense to have as much as possible close to the business logic

yes, but it this case the validation is not so declarative, also if it is not done by annotations it is harder to do the documentation, because in case of annotations you can generate it

I agree that documentation-wise it would be a lot simpler. If annotation-based validation works for you then this is totally fine to use. I'd do it directly in the data model rather than in the parsing layer though. And the codecs which are used for parsing then simply use the generated constructors or factory methods which in turn implement the validation. Wouldn't that work for you?

raderio commented 5 years ago

I've done that without checking if anyone else is doing that already, so if there is a name then I don't know it, nor do I know any articles.

Is it something like https://www.youtube.com/watch?v=M2KCu0Oq3JE ?

It was working very good though so maybe I should write an article about it

Will be great

fluidsonic commented 5 years ago

I've done that without checking if anyone else is doing that already, so if there is a name then I don't know it, nor do I know any articles.

Is it something like https://www.youtube.com/watch?v=M2KCu0Oq3JE ?

Exactly like that! Very interesting video with good ideas on pushing this forward. Thank you for sharing :)

fluidsonic / fluid-json

Add support for Validation #31