davishmcclurg / json_schemer

JSON Schema validator. Supports drafts 4, 6, 7, 2019-09, 2020-12, OpenAPI 3.0, and OpenAPI 3.1.
MIT License
408 stars 64 forks source link

after_property_validation causes corruption #196

Open womblep opened 2 months ago

womblep commented 2 months ago

I have an issue with using oneOf and after_property_validation. after_property_validation is run after each property. In the case where there is a oneOf and subschemas with the same property name but a different format, the value in the data is replaced by the first format and therefore is corrupted for the next schema. The data is being changed globally rather than discarded once the oneOf doesnt match that schema.

A contrived example is below:

require 'json_schemer'

convert_date = proc do |data, property, property_schema, _|
  if data[property] && property_schema.is_a?(Hash) && property_schema['format'] == 'date'
    data[property] = Date.iso8601(data[property])
  elsif data[property] && property_schema.is_a?(Hash) && property_schema['format'] == 'rfc2822'
    data[property] = Date.rfc2822(data[property])
  end
end
schema = {
  'oneOf' => [
    {
      'properties' => {
        'myclass' => {
          'const' => 'iso8601'
        },
        'start_date' => {
          'type' => 'string',
          'format' => 'date'
        },
        'email' => {
          'format' => 'email'
        }
      }
    },
    {
      'properties' => {
        'myclass' => {
          'const' => 'rfc2822'
        },
        'start_date' => {
          'type' => 'string',
          'format' => 'rfc2822'
        },
        'email' => {
          'format' => 'email'
        }
      }
    }
  ]
}
validator= JSONSchemer.schema(
  schema,
  after_property_validation: [convert_date]
)
data = { 'myclass' => 'rfc2822', 'start_date' => '2020-09-03', 'email' => 'example@example.com' }
validator.valid?(data)
puts data.inspect

This will fail when the rfc822 format is hit because the data is already converted to a Date.

Is that expected behavior? I expected that the values were sent into the subschema (and converted) but then discarded when it didn't match the oneOf subschema so it came as a bit of a surprise.

I think there may need to be a need for an after_schema_validation to cover scenarios like this. I think I will be able to put together something based on the annotations.

womblep commented 2 months ago

I was able to do what I needed to do with annotations after validate but it is a little bit clunky. It serves my purpose in fixing a production bug but I wouldn't want to submit it as a PR. If this is expected behavior then maybe treat this as a feature request for a "coerce to schema format" function as part of the validate (after schema validation) or as a stand-alone method on Schema.

davishmcclurg commented 1 month ago

Hi @womblep—thanks for opening this. It looks like the behavior here was mistakenly changed in 2.0.0: aaafab1b02e6017d3048ad62cfc125a33a5c217f The previous (and I think correct?) behavior was to not run property validation hooks for applicator subschemas (allOf/anyOf/oneOf etc) because they can potentially run multiple times otherwise.

womblep commented 1 month ago

@davishmcclurg I am not sure what correct behavior would be in the general sense. In my contrived example above, I am trying to use the format annotation to coerce the value. In my real app, I use the format to encrypt anything that has format = password. I think the issue I have is that the after_property_validation adjusts the data rather than working on a local copy. It seems to me that that it should work on a local copy and once the validation has passed, then that local copy replaces the data that was passed in. Then it wouldn't matter that the after_property_validation is run multiple times as any values in the schema that doesn't pass validation get thrown away. Either that or have a after_schema_validation so that it is only run after the schema validation was successful

ekzobrain commented 1 month ago

@davishmcclurg I am not sure what correct behavior would be in the general sense.

In my contrived example above, I am trying to use the format annotation to coerce the value. In my real app, I use the format to encrypt anything that has format = password.

I think the issue I have is that the after_property_validation adjusts the data rather than working on a local copy. It seems to me that that it should work on a local copy and once the validation has passed, then that local copy replaces the data that was passed in. Then it wouldn't matter that the after_property_validation is run multiple times as any values in the schema that doesn't pass validation get thrown away.

Either that or have a after_schema_validation so that it is only run after the schema validation was successful

Agree