IBM / jsonsubschema

Tool for checking whether a JSON schema is a subschema of another JSON schema.
Apache License 2.0
82 stars 17 forks source link

Avoid slow `regex_meet` in `_joinString` #27

Closed kmichel-aiven closed 4 weeks ago

kmichel-aiven commented 1 month ago

There are already some places where the calling code takes care of avoiding .* and just use None. However, this was not done in _joinString and was producing a trivial .{0,} causing slow calls to regex_meet.

A sample schema that is made faster by this change is:

{
    'anyOf': [
       {
           'title': 'MyEnum',
           'enum': [
               'aaaaaaaaaa',
               'bbbbbbbbbb',
               'cccccccccc',
               'dddddddddd',
               'eeeeeeeeee',
               'ffffffffff',
               'gggggggggg',
               'hhhhhhhhhh',
               'iiiiiiiiii',
               'kkkkkkkkkk'
           ]
       },
       {'type': 'string'}
    ]
}

Which takes ~6sec to be compared with itself with isSubset before this change and ~0.05sec after the change.

shinnar commented 4 weeks ago

Thanks for the PR! This looks good, and I would be happy to merge it. However, as this is your first commit, before we can merge in the code, we need a signed DCO on record: https://github.com/IBM/jsonsubschema/blob/master/DCO1.1.txt Could you please sign and email this to hirzel@us.ibm.com

kmichel-aiven commented 4 weeks ago

Hi, I've sent the DCO and got an acknowledgment.