ExodusMovement / schemasafe

A reasonably safe JSON Schema validator with draft-04/06/07/2019-09/2020-12 support.
https://npmjs.com/@exodus/schemasafe
MIT License
156 stars 12 forks source link

Detect paths of uncertainty for assignDefaults / removeAdditional #124

Closed ChALkeR closed 4 years ago

ChALkeR commented 4 years ago

We should be very careful there, e.g. in how this works with anyOf with a set of refs -- the behavior should be deterministic and consistent with expectations.

Probably, that would mean refusing to compile a schema in assignDefaults mode if defaults are being used on uncertain paths.

Uncertain paths are all that include not/oneOf/anyOf/contains/if/then/else/dependencies/dependentSchemas, potentially with refs after that. Note that oneOf/anyOf used though discriminator are not uncertain.

Alternative solution might involve a multi-pass run, but it seems that that could still end up being inconsistent in some edge cases, so it's best to allow thost only when we are sure that that behavior can be safe.

A trivial basic approach would be to trace the presence of uncertain paths and refuse to use assignDefaults / removeAdditional on the whole schema if there are any paths of uncertainty in the whole schema. discriminator should resolve the most wide use-case fot anyOf/oneOf without introducing uncertainty.

For now, assignDefaults / removeAdditional are disabled by 18c38b7c104f0f83771e1c61b43b7f9f3bdf1dc3 to prevent potential mistakes.

haggholm commented 4 years ago

A different approach could be to avoid mutating the input object. Something like (rough pseudo-code)

function validate(input: any): [boolean, any] {
  if (!valid) return [false, input];
  else if (needsDefaults) return [true, {...input, [missingKey]: defaultVal }];
  else return [true, input];
}
ChALkeR commented 4 years ago

@haggholm Mutating the input object on the fly is orthogonal to the problem of uncertain paths, it still remains a problem without input mutation.

E.g., consider the following schema:

{
  "anyOf": [
    { "properties": { "a": { "const": 10 }, "res": { "default": 10 } } },
    { "properties": { "b": { "const": 20 }, "res": { "default": 20 } } }
  ]
}

What should the result "res" property value be on { "a": 10, "b": 20 } input?

Per the overall JSON Schema approach, it shouldn't be dependent on the order of execution of anyOf branches -- those are supposed to be independent.

To resolve that, I believe it's best to just not allow any paths of uncertainty at all in combination with default or removeAdditional.

Perhaps the detector could be better, e.g. in some cases uncertainty does not cause problems, but detecting that precisely would be needlessly complex, I think.

Schemas can use discriminator approach to resolve oneOf/anyOf uncertainty. Or do you have a specific usecase in mind?

haggholm commented 4 years ago

I guess in my mental model, that schema is basically saying { default: { anyOf: ["10", "20"] } }, and I wouldn’t care.

Granted, in the specific use case I have in mind, what is actually happening is that multiple paths have identical defaults: input can be A or B, each of which has an emails property with the same schema including { type: "array", items: {...}, default: [] }. (If this were a single schema, it would of course be easy enough to write it with the property on only one branch, but in my scenario, I get the schema through a dynamic combination of other schemas.)

ChALkeR commented 4 years ago

that schema is basically saying { default: { anyOf: ["10", "20"] } }, and I wouldn’t care.

The approach taken here is to not have an undefined behavior on schemas even if there are mistakes of that sort. Either the schema should be fully parsed/understood, and the behavior should be predictable and correct, or validator() should throw while compiling a schema.

I can attempt to think how to reduce the false positives rate of this detection (right now it just rejects everything with any uncertain paths combined with assignDefaults/removeAdditional option), but it would help if you could share some specific testcase (perhaps simplified).

I won't promise anything though, as I'm not yet sure if that can be implemented both safely and not overcomplicated.

ChALkeR commented 4 years ago

@haggholm I just checked how ajv behaves on that:

const schema = {
  "properties": { "z" : { "default": 'z' } },
  "anyOf": [
    { "properties": { "a": { "const": 10 }, "res": { "default": 10 } } },
    { "properties": { "b": { "const": 20 }, "res": { "default": 20 } } }
  ]
}
const ajv = require('ajv')({ useDefaults: true })
const validateA = ajv.compile(schema)
const data = { "a": 10, "b": 20 }
validateA(data)
console.log(validateA.errors)
console.log(data)

Output:

null
{ a: 10, b: 20, z: 'z' }

It seems to silently ignore default in those branches. I think this approach can lead to errors and it's safer to throw on schema compile time.

A better detector of problematic schemas could improve things, as right now any schemas with anyOf/oneOf/etc is considered incompatible with useDefaults, but what matters is if there is a default value inside those paths.

Would tracing and allowing schemas that don't use default inside those branches solve the usecase that you have? (Again: a schema example would be useful).