ajv-validator / ajv

The fastest JSON schema Validator. Supports JSON Schema draft-04/06/07/2019-09/2020-12 and JSON Type Definition (RFC8927)
https://ajv.js.org
MIT License
13.85k stars 878 forks source link

Custom formats - asynchronous validation #40

Closed simon-p-r closed 8 years ago

simon-p-r commented 9 years ago

Hi

Can custom formats be async?

Thanks Simon

epoberezkin commented 9 years ago

Not really. Validation is synchronous. What's the use case?

simon-p-r commented 9 years ago

Ok I use z-schema and it provides these hooks for you, if you are performing any meaningful validation you have to do IO to lookup up values

epoberezkin commented 9 years ago

I will think about asynchronous...

I think that "format" is something that can be clearly and independently defined. If something requires IO it is not a "format", it is some other custom validation. I don't think custom validation should be done via JSON-schema. It undermines schema platform independence.

I can reconsider under pressure though :)

simon-p-r commented 9 years ago

What about a format that requires a webservice to validate it? Such as http://ec.europa.eu/taxation_customs/vies/faq.html

epoberezkin commented 9 years ago

I assume you are talking about VAT identification number, correct me if I am wrong. There are two parts in its validation. The first is "format" validation - it is clearly defined in Q11 and can be defined via regex or a function. The second is establishing that it is valid based on some proprietary algorithm or based on whether it is assigned to some company. I think it should be validated in a different application layer from the one that receives the data and establishes its validity to the JSON scheme, because it belongs to the business logic. Mixing IO / data validation with business logic validation is not the most efficient thing to do. JSON-schema is not designed for business logic validation.

epoberezkin commented 9 years ago

Credit card number is similar, by the way. Format validation will just look for 16 digits with optional spaces. Business logic validation can look up database or access the web service. The first should be in the schema, the second should not be.

simon-p-r commented 9 years ago

I see your unwilling to make change so will close issue, what is Q11?

epoberezkin commented 9 years ago

Question 11 in the document you've posted. It has format specifications for VAT identification number in different countries. I think this is what the schema should validate against.

At best the change is not very easy to make - changing compiled functions to be asynchronous is quite an effort. So it won't happen very soon. If you really need async, z-schema is a very good choice, because although z-schema is much slower, with async the validation speed is not that important - it will be a tiny fraction of I/O anyway. At the same time z-schema is a very mature and battle-tested validator.

I've just tried to suggest you to consider an alternative approach, but it is up to you of course...

But keep it open; as I said, I will think about it, I am not 100% certain about the right approach to asynchronous and custom validations. I may implement them at some point.

simon-p-r commented 9 years ago

Sure thanks for your comments, I like the speed and simplicity of your approach so may have to think about how to separate business logic from the initial schema validation you refer to.

epoberezkin commented 9 years ago

You are welcome.

epoberezkin commented 8 years ago

It should also be possible to create asynchronous custom keywords then.

trikadin commented 8 years ago

@epoberezkin, how to add asynchronously compiled schema to the ajv instance? Something like that:

ajv.compileAsync(schema, (err, compiled) => {
  if (err) {
    return;
  }
  ajv.addSchema('testSchema', schema);
})

Will this works properly?

epoberezkin commented 8 years ago

If the schema has id property (and it is always better to have it) you don't need to do anything - it will be added automatically if it is successfully compiled (synchronous compile does that too). That's what I'd recommend.

If you want to associate schema with an additional key you can use ajv.addSchema(schema, key). You have the wrong order of parameters in your sample btw. The downside is that the schema associated with this key will be compiled again (synchronously - all the dependencies will be available at this time) when you use ajv.validate(key, data) or ajv.getSchema(key). Also if the schema has id this call will throw an exception, because id has to be unique.

trikadin commented 8 years ago

So, the only difference between compile and addSchema is the lazy compilation (and, well, associated key), am I right? Sounds kinda confusing, especially the last sentence. Any plans to refactor API?) Something like

ajv.addSchema(schema, options: {lazy: true, async: false, key: 'someSchema'})

In fact, I'm entusiastic to send some PRs, you don't mind?

You have the wrong order of parameters in your sample btw.

Yep, thank you)

epoberezkin commented 8 years ago

compile can also be used to compile schemas without IDs. These schemas won't be added to the instance (apart from caching compiled function, but cache is an internal thing - there is no public API to access it), so using addSchema to compile them would be indeed confusing. By the way, I actually started schema level options for some other reason but it was becoming very complicated and confusing, so I abandoned it.

I also think multiple methods are better than options... It's better to keep API as it is unless there is something that cannot be achieved with it.

There is an option suggested in #83 that when implemented would make compile even more different. Maybe it should also be controlling when/if compile and validate add schemas to the instance (always - would throw if there is no ID / if ID present - current behaviour, throws if ID is not unique / if unique ID present / never). Not sure what the property name should be and possible values in this case, so if you have some ideas please share them. As long as it doesn't change the default behaviour it can be useful.

rdsubhas commented 8 years ago

Firstly thanks so much. ajv with custom keywords has helped us a lot with business logic validation in our backend. We consolidated a whole bunch of controller-level validations into JSON-Schema with custom keywords. The net effect is far far better than writing individual validation code.

Keeping the standard aside, the value offered by the combination of ajv, JSON Schema and async keywords would be really big! But anyways, we are already thankful to ajv as it is.

A hidden reason why I would personally like to see async keywords, is that it will also clear out the current error handling. The callback would be function(err, valid) so I don't have to separately access the error object (even though its safe, its a bit disconcerting to see) :smiley:

epoberezkin commented 8 years ago

Thanks :)

Async formats and keywords is an interesting challenge... I agree that having all validation logic in one place is convenient. So they are coming some day.

My only advice is to maintain and to use your own custom meta-schema that includes them. You just need to do something like this:

{
  "id": "http://thoughtworks.com/schemas/meta-schema.json#",
  "$schema": "v4 or v5 meta schema uri",
  "allOf": [
      { "$ref": "v4 or v5 meta schema uri" },
      {
        "properties": {
           // your custom keywords go here...
        }
      }
  ]
}

In this way you won't have to validate keyword values in their definitions, your schemas will be correctly validated and the keywords you've created would be properly documented in one place.

I am curious, what keywords you are using and what is your main implementation mechanism (I like macros more than anything but they are limited of course :)? Maybe they can be interesting for other people too... If that's the case it's relatively easy to make them into an npm module(s).

epoberezkin commented 8 years ago

@rdsubhas I added docs for inline keywords, can be useful for you

rdsubhas commented 8 years ago

@epoberezkin thank you so much! Sorry I couldn't follow up to your previous question, we do quite a bit of things like detecting duplicate entries, non-overlapping ranges, and a few more! I'll post a list of things very soon :+1: Thanks for the docs, inline functions look really interesting as we now have access to the parent, that opens up a new range of possibilities :)

epoberezkin commented 8 years ago

@rdsubhas you're welcome. "inline" functions existed from the beginning, that's how most v5 keywords are implemented (only contains is implemented as macro). But I've realised that not only it's not easy to figure out how to use them, but also given that it was private api it was risky to use it. What you see in the doc is very stable, it's very unlikely to change, definitely not without major version change.

epoberezkin commented 8 years ago

I was thinking about asynchronous schemas and I think that the good approach is to have $async keyword that should exist both on the top level of the schema to declare it as asynchronous and on the level where asynchronous elements exist:

{
  "id": "...",
  "$schema": "v4-async.json or v5-async.json",
  "$async": true,
  "type": "object",
  "properties": {
    "company": { "type": "string" },
    "vatId": {
      "$async": true,
      "type": "string",
      "format": "VAT-ID"
    },
    "registration": {
      "$async": true,
      "companiesHouse": { "regNo": "XXX" }
    }
  }
}

The format "VAT-ID" and the keyword companiesHouse are asynchronous in this example.

If $async is present on the top level the schema will always be asynchronous, even if there are no async elements inside. If $async is absent either on the top level or on the level where async elements exist, the schema will be invalid (the compilation will fail).

What do you think? Why it is good or bad?

simon-p-r commented 8 years ago

$async keyword is good. Can I use macro method with a callback as 3rd parameter?

epoberezkin commented 8 years ago

It's not implemented yet, just an idea. Macro returns schema, so it will remain synchronous. It may contain async schema though once it's implemented.

epoberezkin commented 8 years ago

It's implemented in async branch, give it a go. At the moment it requires generators support, so only nodejs / Chrome / Firefox. There will be an option to transpile compiled validation functions to ES5/6. In the browser it will require an additional (or bigger) bundle. Compiling schemas into functions with callbacks is not a feasible option - it would require a lot of changes everywhere.

See async validation test for the examples of async format and keyword. Validation function for a custom format/keyword should return a promise that resolves to true/false. The compiled validation function is a generator function that you can use with co or directly, e.g. in koa 1.0. If used with co it returns a promise that resolves to true or rejects with an exception that has errors property.

At the moment it needs $async: true on the top level of the schema (and on the top level of dependencies if they have async formats/keywords) but also requires $async keyword on the level where async format/keyword is used (I am thinking to drop this second requirement).

Let me know what you think.

epoberezkin commented 8 years ago

in 3.5.0

epoberezkin commented 8 years ago

@rdsubhas now with 3.7.0 you can access parent data from "compile" and "validate" custom keywords.