IBM / jsonsubschema

Tool for checking whether a JSON schema is a subschema of another JSON schema.
Apache License 2.0
82 stars 17 forks source link

pattern - error with escaped characters #6

Open michaelfruth opened 3 years ago

michaelfruth commented 3 years ago

Hello,

There is a problem when using escaped characters in a pattern. E.g.:

{ "type": "string", "anyOf": [ { "pattern": "^a\\-b$" }, { "pattern": "^b\\-c$" } ] }

Comparing this schema with itself (command jsonsubschema a.json a.json) will throw an error. More specifically, the library greenery throws an error. It seems like greenery can not handle escaped characters. When you delete one of the pattern, everything works fine, because the problematic greenery method is not called. The error is also thrown for other escaped characters like "\ " (whitespace), "\." ...

Best Regards Michael

andrewhabib commented 3 years ago

Mmm... I am not sure this is an inherent problem in greenery.

The problem seems related to the fact that in python string literals, backslash is used to escape special chars, including the backslash itself. In other words, one has to use '\\' to get the literal '\', if the string is not raw.

So if one wants to write the regex 'a\\-b' which matches the string literal 'a\-b' I guess, it should either be specified as raw pattern, i.e., r'a\\-b' or use the pattern 'a\\\-b' where the first backslash is the escape char, and the following double backslash for the literal backslash. Using either of those tricks, makes the check pass successfully.

In general, although greenery is quite useful and powerful, trying to use it for the full spectrum of ECMAScript regular expressions is error prone and it cannot cover all cases by nature. However, I need to think more whether using raw strings for patterns passed to greenery will permanently solve this problem and related ones, or will it cause other unexpected behavior.