IBM / jsonsubschema

Tool for checking whether a JSON schema is a subschema of another JSON schema.
Apache License 2.0
82 stars 17 forks source link

Fix unpredictable `not` and untyped simplification #29

Closed kmichel-aiven closed 2 weeks ago

kmichel-aiven commented 3 weeks ago

When not is simplified to a anyOf of all other types, the order of the anyOf alternative depends on the dictionary keys iteration order.

If one of anyOf alternative implementation has a bug, then tests will randomly fail and succeed on consecutive invocations of python.

For instance in the stack trace below, the regular expression escaping does not match the needs of greenery, but the bug is only triggered if _isAnyofSubtype does not return early because another alternative rejected the type before the string check.

Traceback (most recent call last):
  File "/home/runner/work/jsonsubschema/jsonsubschema/test/test_mix.py", line 162, in test_not_number
    self.assertFalse(isSubschema(s2, s1))
  File "/home/runner/work/jsonsubschema/jsonsubschema/jsonsubschema/api.py", line 57, in isSubschema
    return s1.isSubtype(s2)
  File "/home/runner/work/jsonsubschema/jsonsubschema/jsonsubschema/_checkers.py", line 163, in isSubtype
    return self.subtype_enum(s) and self._isSubtype(s)
  File "/home/runner/work/jsonsubschema/jsonsubschema/jsonsubschema/_checkers.py", line 1557, in _isSubtype
    return _isAnyofSubtype(self, s)
  File "/home/runner/work/jsonsubschema/jsonsubschema/jsonsubschema/_checkers.py", line 1552, in _isAnyofSubtype
    if not s.isSubtype(s2):
  File "/home/runner/work/jsonsubschema/jsonsubschema/jsonsubschema/_checkers.py", line 163, in isSubtype
    return self.subtype_enum(s) and self._isSubtype(s)
  File "/home/runner/work/jsonsubschema/jsonsubschema/jsonsubschema/_checkers.py", line 402, in _isSubtype
    return super().isSubtype_handle_rhs(s, _isStringSubtype)
  File "/home/runner/work/jsonsubschema/jsonsubschema/jsonsubschema/_checkers.py", line 198, in isSubtype_handle_rhs
    return isSubtype_cb(self, s)
  File "/home/runner/work/jsonsubschema/jsonsubschema/jsonsubschema/_checkers.py", line 395, in _isStringSubtype
    if utils.regex_isSubset(pattern1, pattern2):
  File "/home/runner/work/jsonsubschema/jsonsubschema/jsonsubschema/_utils.py", line 202, in regex_isSubset
    return parse(s2).equivalent(parse(".*"))
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/greenery/parse.py", line 390, in parse
    raise NoMatch(f"Could not parse {string!r} beyond index {i}")
greenery.parse.NoMatch: Could not parse '<0|0<=X<200|>=200|no\\ checking' beyond index 20

Similarly, when the type is not set, a dict is canonicalized by adding all types, then trying to join them to remove the unnecessary values.

Because _join is not commutative or associative, the order of types matters and cause variations in the canonicalized output.

If there are issues in some _join that cause errors, then the tests become flaky, for instance the exception below only happens when enumerating Jtypes starts with number and is followed by integer.

(Any other order will first create a JSONanyOf that will only attempt a real join on the anyOf alternatives if the types are identical which won't call the failing code because number != integer).

Traceback (most recent call last):
  File "/home/runner/work/jsonsubschema/jsonsubschema/test/test_numeric.py", line 649, in test_all_all_3
    self.assertFalse(isSubschema(s2, s1))
  File "/home/runner/work/jsonsubschema/jsonsubschema/jsonsubschema/api.py", line 56, in isSubschema
    s1, s2 = prepare_operands(s1, s2)
  File "/home/runner/work/jsonsubschema/jsonsubschema/jsonsubschema/api.py", line 46, in prepare_operands
    canonicalize_schema(s2))
  File "/home/runner/work/jsonsubschema/jsonsubschema/jsonsubschema/_canonicalization.py", line 35, in canonicalize_schema
    canonical_schema = canonicalize_dict(obj)
  File "/home/runner/work/jsonsubschema/jsonsubschema/jsonsubschema/_canonicalization.py", line 84, in canonicalize_dict
    return canonicalize_connectors(d)
  File "/home/runner/work/jsonsubschema/jsonsubschema/jsonsubschema/_canonicalization.py", line 218, in canonicalize_connectors
    allofs.append(canonicalize_dict({c: d[c]}))
  File "/home/runner/work/jsonsubschema/jsonsubschema/jsonsubschema/_canonicalization.py", line 84, in canonicalize_dict
    return canonicalize_connectors(d)
  File "/home/runner/work/jsonsubschema/jsonsubschema/jsonsubschema/_canonicalization.py", line 211, in canonicalize_connectors
    simplified = simplify_schema_and_embed_checkers(d)
  File "/home/runner/work/jsonsubschema/jsonsubschema/jsonsubschema/_canonicalization.py", line 369, in simplify_schema_and_embed_checkers
    allofs = [simplify_schema_and_embed_checkers(i) for i in s["allOf"]]
  File "/home/runner/work/jsonsubschema/jsonsubschema/jsonsubschema/_canonicalization.py", line 369, in <listcomp>
    allofs = [simplify_schema_and_embed_checkers(i) for i in s["allOf"]]
  File "/home/runner/work/jsonsubschema/jsonsubschema/jsonsubschema/_canonicalization.py", line 366, in simplify_schema_and_embed_checkers
    return boolToConstructor.get("anyOf")({"anyOf": anyofs})
  File "/home/runner/work/jsonsubschema/jsonsubschema/jsonsubschema/_checkers.py", line 1481, in JSONanyOfFactory
    ret = ret.join(i)
  File "/home/runner/work/jsonsubschema/jsonsubschema/jsonsubschema/_checkers.py", line 133, in join
    ret = self._join(s)
  File "/home/runner/work/jsonsubschema/jsonsubschema/jsonsubschema/_checkers.py", line 686, in _join
    return _joinNumber(self, s)
  File "/home/runner/work/jsonsubschema/jsonsubschema/jsonsubschema/_checkers.py", line 672, in _joinNumber
    gcd = utils.gcd(s1.multipleOf, s2.multipleOf)
  File "/home/runner/work/jsonsubschema/jsonsubschema/jsonsubschema/_utils.py", line 269, in gcd
    return fractions.gcd(x, y)
AttributeError: module 'fractions' has no attribute 'gcd'

The gcd error itself is handled here: https://github.com/IBM/jsonsubschema/pull/30