bible-technology / scripture-burrito

Scripture Burrito Schema & Docs 🌯
http://docs.burrito.bible/
MIT License
21 stars 13 forks source link

JSON Schema ought to work with more than one implementation of JSON Schema #147

Closed mvahowe closed 4 years ago

mvahowe commented 4 years ago

Right now, there are two validation scripts in SB, for JS and Python respectively. The schema passes according to the JS one. The huge and utterly uninformative error from the Python implementation is pasted at the end of this post.

This kind of issue was entirely predictable (and indeed was predicted before we moved to JSON), but this doesn't help to define a way forward.

One thing we should do is find out just how variable JSON Schema behaviour is across languages we care about. That would include

I strongly suspect that I've hit an edge case in JSON Schema (I have an unerring ability to do this with languages in general). If we can find the edge case we can probably steer away from it and restore similar behaviour across languages.

Also, if we can define the edge case(s) well we can submit a bug report or even a fix to the Python JSON Schema project.

Another option is to say (preferably with a straight face) that we recommend using the JS implementation everywhere, if necessary via a shell script invocation.

If someone with a big heart and no foresight about hosting costs was to offer SB validation as an API, that would help too.

mark@jsexp:~/scripture-burrito/code$ ./validate.py ../docs/examples/artifacts/textTranslation.json
../docs/examples/artifacts/textTranslation.json: {'meta': {'version': '0.2.0', 'variant': 'default', 'dateCreated': '2019-02-19T01:02:03+01:00', 'generator': {'softwareName': 'Burrito Factory', 'softwareVersion': '0.1', 'userName': 'Jane Doe'}, 'uploader': {'softwareName': 'Burrito Truck', 'softwareVersion': '0.1', 'userId': 'dbl::5678', 'userName': 'Josh Buck'}, 'defaultLanguage': 'en', 'comments': ['Experimenting with i18n', 'Fixed canon before upload. ~Josh']}, 'idServers': {'dbl': {'id': 'https://thedigitalbiblelibrary.org', 'name': {'en': 'The Digital Bible Library'}}, 'agmt': {'id': 'http://registry.autographamt.com', 'name': {'en': 'Autographa'}}, 'x-atl': {'id': 'http://atlantisbibleconsortium.net'}}, 'identification': {'systemId': {'dbl': {'id': '0123456789abcdef', 'revision': '23'}, 'gbc': {'id': '55df02965117ad3f2201309b'}, 'paratext': {'id': '2d5220a02a7aaac6bcc2831ae262e9aaca5e1abd'}}, 'idServer': 'dbl', 'name': {'en': 'Scripture Burrito Demo Text Bible', 'fr': 'Crêpe mexicaine biblique surdimensionnée (démonstration)'}, 'description': {'en': 'A Demonstration Scripture Burrito containing Text, like Paratext Might One Day Produce'}, 'abbreviation': {'en': 'DSB', 'fr': 'CMBS'}}, 'confidentiality': {'metadata': 'unrestricted', 'source': 'private', 'publications': 'restricted'}, 'type': {'flavorType': {'name': 'scripture', 'currentScope': {'GEN': [], 'EXO': ['1', '3-12', '13:4', '14:3-8', '15:8-16:2'], 'LEV': ['2-3'], 'MAT': ['1', '5', '7-11']}, 'canonType': ['ot', 'nt'], 'canonSpec': {'ot': {'name': 'western'}, 'nt': {'name': 'x-matthewOnlyMillenialists', 'books': ['MAT']}}, 'flavor': {'name': 'textTranslation', 'projectType': 'standard', 'audience': 'common', 'translationType': 'newTranslation', 'usfmVersion': '3.1.rc49'}}}, 'relationships': [{'relationType': 'expression', 'flavor': 'scripturePrint', 'id': 'dbl::fedcba9876543210:2'}, {'relationType': 'expression', 'flavor': 'glossedTextStory', 'id': 'x-atl::gl47'}, {'relationType': 'parascriptural', 'flavor': 'parascripturalWordAlignment', 'id': 'agmt::irvmal-4-wh'}], 'languages': [{'tag': 'en', 'name': {'en': 'English', 'de': 'Englisch', 'fr': 'anglais'}, 'numberingSystem': 'latn'}], 'countries': [{'code': 'NL', 'name': {'nl': 'Nederland', 'kl': 'Pukkitsormiut', 'la': 'Batavia', 'ru': 'Нидерланды'}}], 'agencies': [{'id': 'dbl::23', 'name': {'en': 'Burritos R Us Inc'}, 'abbr': {'en': 'BRU'}, 'url': 'https://burritos-r-us.org', 'roles': ['rightsHolder', 'content', 'finance', 'management', 'publication', 'qa']}, {'id': 'dbl::29', 'name': {'en': 'We Manage Burritos'}, 'roles': ['qa']}], 'copyright': {'rightsHolderAgencies': [0, 1], 'rightsAdminAgency': 1, 'licenses': [{'url': 'https://burritos-r-us.org/licenses/3247'}], 'shortStatementPlain': {'fr': '© Burritos R Us 2019.'}, 'fullStatementPlain': {'fr': '© Burritos R Us 2019. Tous droits réservés.'}, 'fullStatementRich': {'fr': '<p><b>© Burritos R Us 2019.</b></p><p><i>Tous droits réservés.</i></p>'}}, 'ingredients': {'source/usfm/OTINT.sfm': {'checksum': {'md5': '0123456789abcdef0123456789abcdef'}, 'mimeType': 'text/x-sfm', 'role': 'intot', 'size': 1234, 'isSource': True}, 'source/usfm/GEN.sfm': {'checksum': {'md5': '0123456789abcdef0123456789abcdef'}, 'mimeType': 'text/x-sfm', 'scope': {'GEN': []}, 'size': 1234, 'isSource': True}, 'source/usfm/EXO.sfm': {'checksum': {'md5': '0123456789abcdef0123456789abcdef'}, 'mimeType': 'text/x-sfm', 'scope': {'EXO': ['1-12']}, 'size': 1234, 'isSource': True}, 'source/usfm/LEV.sfm': {'checksum': {'md5': '0123456789abcdef0123456789abcdef'}, 'mimeType': 'text/x-sfm', 'scope': {'LEV': ['2:3-3:7']}, 'size': 1234, 'isSource': True}, 'source/usfm/INTNT.sfm': {'checksum': {'md5': '0123456789abcdef0123456789abcdef'}, 'mimeType': 'text/x-sfm', 'role': 'intnt', 'size': 1234, 'isSource': True}, 'source/usfm/INTMAT.sfm': {'checksum': {'md5': '0123456789abcdef0123456789abcdef'}, 'mimeType': 'text/x-sfm', 'role': 'intMAT', 'size': 1234, 'isSource': True}, 'source/usfm/MAT.sfm': {'checksum': {'md5': '0123456789abcdef0123456789abcdef'}, 'mimeType': 'text/x-sfm', 'scope': {'MAT': ['1:3', '1:5', '1:7-11']}, 'size': 1234, 'isSource': True}, 'release/text/USX_1/OTINT.usx': {'checksum': {'md5': '0123456789abcdef0123456789abcdef'}, 'mimeType': 'text/x-usx+xml', 'role': 'intot', 'size': 1234}, 'release/text/USX_1/GEN.usx': {'checksum': {'md5': '0123456789abcdef0123456789abcdef'}, 'mimeType': 'text/x-usx+xml', 'scope': {'GEN': []}, 'size': 1234}, 'release/text/USX_1/EXO.usx': {'checksum': {'md5': '0123456789abcdef0123456789abcdef'}, 'mimeType': 'text/x-usx+xml', 'scope': {'EXO': ['1-12']}, 'size': 1234}, 'release/text/USX_1/LEV.usx': {'checksum': {'md5': '0123456789abcdef0123456789abcdef'}, 'mimeType': 'text/x-usx+xml', 'scope': {'LEV': ['2:3-3:7']}, 'size': 1234}, 'release/text/USX_1/INTNT.usx': {'checksum': {'md5': '0123456789abcdef0123456789abcdef'}, 'mimeType': 'text/x-usx+xml', 'role': 'intnt', 'size': 1234}, 'release/text/USX_1/INTMAT.usx': {'checksum': {'md5': '0123456789abcdef0123456789abcdef'}, 'mimeType': 'text/x-usx+xml', 'role': 'intMAT', 'size': 1234}, 'release/text/USX_1/MAT.usx': {'checksum': {'md5': '0123456789abcdef0123456789abcdef'}, 'mimeType': 'text/x-usx+xml', 'scope': {'MAT': ['1:3', '1:5', '1:7-11']}, 'size': 1234}, 'unknownAdditive.foo': {'mimeType': 'application/octet-stream', 'size': 99}}, 'names': {'book-gen': {'abbr': {'fr': 'Gn'}, 'short': {'fr': 'Genèse'}, 'long': {'fr': 'La Genèse'}}, 'book-exo': {'abbr': {'fr': 'Ex'}, 'short': {'fr': 'Exode'}, 'long': {'fr': 'L’Exode'}}, 'book-lev': {'abbr': {'fr': 'Lv'}, 'short': {'fr': 'Lévitique'}, 'long': {'fr': 'Le Lévitique'}}, 'book-mat': {'abbr': {'fr': 'Mt'}, 'short': {'en': 'Matthew', 'fr': 'Matthieu'}, 'long': {'fr': 'Evangile selon Matthieu'}}, 'frontmatter': {'short': {'fr': 'Avant de lire Matthieu ...'}}, 'intnt': {'short': {'fr': 'A propos du Nouveau Testament'}}, 'intmat': {'short': {'fr': 'A propos de Matthieu'}}}, 'progress': {'dateStarted': '2017-11-30', 'dateCompleted': '2017-12-01'}} is not valid under any of the given schemas

Failed validating 'oneOf' in schema:
    {'$id': 'https://burrito.bible/schema/metadata.schema.json',
     '$schema': 'http://json-schema.org/draft-07/schema',
     'description': 'Scripture Burrito root metadata object.',
     'oneOf': [{'$ref': 'default_metadata.schema.json'},
               {'$ref': 'derived_metadata.schema.json'}],
     'title': 'Scripture Burrito Metadata',
     'type': 'object'}

On instance:
    {'agencies': [{'abbr': {'en': 'BRU'},
                   'id': 'dbl::23',
                   'name': {'en': 'Burritos R Us Inc'},
                   'roles': ['rightsHolder',
                             'content',
                             'finance',
                             'management',
                             'publication',
                             'qa'],
                   'url': 'https://burritos-r-us.org'},
                  {'id': 'dbl::29',
                   'name': {'en': 'We Manage Burritos'},
                   'roles': ['qa']}],
     'confidentiality': {'metadata': 'unrestricted',
                         'publications': 'restricted',
                         'source': 'private'},
     'copyright': {'fullStatementPlain': {'fr': '© Burritos R Us 2019. '
                                                'Tous droits réservés.'},
                   'fullStatementRich': {'fr': '<p><b>© Burritos R Us '
                                               '2019.</b></p><p><i>Tous '
                                               'droits réservés.</i></p>'},
                   'licenses': [{'url': 'https://burritos-r-us.org/licenses/3247'}],
                   'rightsAdminAgency': 1,
                   'rightsHolderAgencies': [0, 1],
                   'shortStatementPlain': {'fr': '© Burritos R Us 2019.'}},
     'countries': [{'code': 'NL',
                    'name': {'kl': 'Pukkitsormiut',
                             'la': 'Batavia',
                             'nl': 'Nederland',
                             'ru': 'Нидерланды'}}],
     'idServers': {'agmt': {'id': 'http://registry.autographamt.com',
                            'name': {'en': 'Autographa'}},
                   'dbl': {'id': 'https://thedigitalbiblelibrary.org',
                           'name': {'en': 'The Digital Bible Library'}},
                   'x-atl': {'id': 'http://atlantisbibleconsortium.net'}},
     'identification': {'abbreviation': {'en': 'DSB', 'fr': 'CMBS'},
                        'description': {'en': 'A Demonstration Scripture '
                                              'Burrito containing Text, '
                                              'like Paratext Might One Day '
                                              'Produce'},
                        'idServer': 'dbl',
                        'name': {'en': 'Scripture Burrito Demo Text Bible',
                                 'fr': 'Crêpe mexicaine biblique '
                                       'surdimensionnée (démonstration)'},
                        'systemId': {'dbl': {'id': '0123456789abcdef',
                                             'revision': '23'},
                                     'gbc': {'id': '55df02965117ad3f2201309b'},
                                     'paratext': {'id': '2d5220a02a7aaac6bcc2831ae262e9aaca5e1abd'}}},
     'ingredients': {'release/text/USX_1/EXO.usx': {'checksum': {'md5': '0123456789abcdef0123456789abcdef'},
                                                    'mimeType': 'text/x-usx+xml',
                                                    'scope': {'EXO': ['1-12']},
                                                    'size': 1234},
                     'release/text/USX_1/GEN.usx': {'checksum': {'md5': '0123456789abcdef0123456789abcdef'},
                                                    'mimeType': 'text/x-usx+xml',
                                                    'scope': {'GEN': []},
                                                    'size': 1234},
                     'release/text/USX_1/INTMAT.usx': {'checksum': {'md5': '0123456789abcdef0123456789abcdef'},
                                                       'mimeType': 'text/x-usx+xml',
                                                       'role': 'intMAT',
                                                       'size': 1234},
                     'release/text/USX_1/INTNT.usx': {'checksum': {'md5': '0123456789abcdef0123456789abcdef'},
                                                      'mimeType': 'text/x-usx+xml',
                                                      'role': 'intnt',
                                                      'size': 1234},
                     'release/text/USX_1/LEV.usx': {'checksum': {'md5': '0123456789abcdef0123456789abcdef'},
                                                    'mimeType': 'text/x-usx+xml',
                                                    'scope': {'LEV': ['2:3-3:7']},
                                                    'size': 1234},
                     'release/text/USX_1/MAT.usx': {'checksum': {'md5': '0123456789abcdef0123456789abcdef'},
                                                    'mimeType': 'text/x-usx+xml',
                                                    'scope': {'MAT': ['1:3',
                                                                      '1:5',
                                                                      '1:7-11']},
                                                    'size': 1234},
                     'release/text/USX_1/OTINT.usx': {'checksum': {'md5': '0123456789abcdef0123456789abcdef'},
                                                      'mimeType': 'text/x-usx+xml',
                                                      'role': 'intot',
                                                      'size': 1234},
                     'source/usfm/EXO.sfm': {'checksum': {'md5': '0123456789abcdef0123456789abcdef'},
                                             'isSource': True,
                                             'mimeType': 'text/x-sfm',
                                             'scope': {'EXO': ['1-12']},
                                             'size': 1234},
                     'source/usfm/GEN.sfm': {'checksum': {'md5': '0123456789abcdef0123456789abcdef'},
                                             'isSource': True,
                                             'mimeType': 'text/x-sfm',
                                             'scope': {'GEN': []},
                                             'size': 1234},
                     'source/usfm/INTMAT.sfm': {'checksum': {'md5': '0123456789abcdef0123456789abcdef'},
                                                'isSource': True,
                                                'mimeType': 'text/x-sfm',
                                                'role': 'intMAT',
                                                'size': 1234},
                     'source/usfm/INTNT.sfm': {'checksum': {'md5': '0123456789abcdef0123456789abcdef'},
                                               'isSource': True,
                                               'mimeType': 'text/x-sfm',
                                               'role': 'intnt',
                                               'size': 1234},
                     'source/usfm/LEV.sfm': {'checksum': {'md5': '0123456789abcdef0123456789abcdef'},
                                             'isSource': True,
                                             'mimeType': 'text/x-sfm',
                                             'scope': {'LEV': ['2:3-3:7']},
                                             'size': 1234},
                     'source/usfm/MAT.sfm': {'checksum': {'md5': '0123456789abcdef0123456789abcdef'},
                                             'isSource': True,
                                             'mimeType': 'text/x-sfm',
                                             'scope': {'MAT': ['1:3',
                                                               '1:5',
                                                               '1:7-11']},
                                             'size': 1234},
                     'source/usfm/OTINT.sfm': {'checksum': {'md5': '0123456789abcdef0123456789abcdef'},
                                               'isSource': True,
                                               'mimeType': 'text/x-sfm',
                                               'role': 'intot',
                                               'size': 1234},
                     'unknownAdditive.foo': {'mimeType': 'application/octet-stream',
                                             'size': 99}},
     'languages': [{'name': {'de': 'Englisch',
                             'en': 'English',
                             'fr': 'anglais'},
                    'numberingSystem': 'latn',
                    'tag': 'en'}],
     'meta': {'comments': ['Experimenting with i18n',
                           'Fixed canon before upload. ~Josh'],
              'dateCreated': '2019-02-19T01:02:03+01:00',
              'defaultLanguage': 'en',
              'generator': {'softwareName': 'Burrito Factory',
                            'softwareVersion': '0.1',
                            'userName': 'Jane Doe'},
              'uploader': {'softwareName': 'Burrito Truck',
                           'softwareVersion': '0.1',
                           'userId': 'dbl::5678',
                           'userName': 'Josh Buck'},
              'variant': 'default',
              'version': '0.2.0'},
     'names': {'book-exo': {'abbr': {'fr': 'Ex'},
                            'long': {'fr': 'L’Exode'},
                            'short': {'fr': 'Exode'}},
               'book-gen': {'abbr': {'fr': 'Gn'},
                            'long': {'fr': 'La Genèse'},
                            'short': {'fr': 'Genèse'}},
               'book-lev': {'abbr': {'fr': 'Lv'},
                            'long': {'fr': 'Le Lévitique'},
                            'short': {'fr': 'Lévitique'}},
               'book-mat': {'abbr': {'fr': 'Mt'},
                            'long': {'fr': 'Evangile selon Matthieu'},
                            'short': {'en': 'Matthew', 'fr': 'Matthieu'}},
               'frontmatter': {'short': {'fr': 'Avant de lire Matthieu '
                                               '...'}},
               'intmat': {'short': {'fr': 'A propos de Matthieu'}},
               'intnt': {'short': {'fr': 'A propos du Nouveau Testament'}}},
     'progress': {'dateCompleted': '2017-12-01',
                  'dateStarted': '2017-11-30'},
     'relationships': [{'flavor': 'scripturePrint',
                        'id': 'dbl::fedcba9876543210:2',
                        'relationType': 'expression'},
                       {'flavor': 'glossedTextStory',
                        'id': 'x-atl::gl47',
                        'relationType': 'expression'},
                       {'flavor': 'parascripturalWordAlignment',
                        'id': 'agmt::irvmal-4-wh',
                        'relationType': 'parascriptural'}],
     'type': {'flavorType': {'canonSpec': {'nt': {'books': ['MAT'],
                                                  'name': 'x-matthewOnlyMillenialists'},
                                           'ot': {'name': 'western'}},
                             'canonType': ['ot', 'nt'],
                             'currentScope': {'EXO': ['1',
                                                      '3-12',
                                                      '13:4',
                                                      '14:3-8',
                                                      '15:8-16:2'],
                                              'GEN': [],
                                              'LEV': ['2-3'],
                                              'MAT': ['1', '5', '7-11']},
                             'flavor': {'audience': 'common',
                                        'name': 'textTranslation',
                                        'projectType': 'standard',
                                        'translationType': 'newTranslation',
                                        'usfmVersion': '3.1.rc49'},
                             'name': 'scripture'}}}
mvahowe commented 4 years ago

One possibility is skew between Python and JS regexes.

rdb commented 4 years ago

You got a "oneOf" error on the root object that disambiguates between the two derived variants. This is precisely the reason why I advocated against this approach. If there is any error in either schema, what you get is a validation error on the topmost invalid condition, which makes debugging impossible.

To see what the error actually is, I'd manually validate it against either the default or the derived metadata schema directly.

If we want to continue on with the approach, I'd change the validation script to manually validate it against either default or derived schemas to get a better error message.

mvahowe commented 4 years ago

Right but

  1. The JS implementation does provide helpful(ish) information on this kind of error
  2. There are many other places where we use oneOf.
  3. I'm still not convinced that either cascading conditionals or implementation-specific procedural code to pick between schema is going to scale to what we eventually need. I'm about to open an issue about templates which we need to support inside the main schema.
rdb commented 4 years ago

Which branch are the failing schema and example file on? I'm happy to take a look. The develop branch passes, both on CI and on my own computer.

mvahowe commented 4 years ago

For the record, I just added

"peach": "melba"

under idServers, and this is the error I get from the JS validator. It's actually better than any error trace I've seen from any free implementation of RelaxNG.

../docs/examples/artifacts/textTranslation.json: [
  {
    keyword: 'pattern',
    dataPath: ".idServers['agmt'].name",
    schemaPath: '#/definitions/languageTag/pattern',
    params: { pattern: '^[A-Za-z]{2,3}([\\-_][A-Za-z0-9]+){0,4}$' },
    message: 'should match pattern "^[A-Za-z]{2,3}([\\-_][A-Za-z0-9]+){0,4}$"',
    propertyName: 'peach'
  },
  {
    keyword: 'propertyNames',
    dataPath: ".idServers['agmt'].name",
    schemaPath: '#/propertyNames',
    params: { propertyName: 'peach' },
    message: "property name 'peach' is invalid"
  },
  {
    keyword: 'additionalProperties',
    dataPath: '',
    schemaPath: '#/additionalProperties',
    params: { additionalProperty: 'progress' },
    message: 'should NOT have additional properties'
  },
  {
    keyword: 'oneOf',
    dataPath: '',
    schemaPath: '#/oneOf',
    params: { passingSchemas: null },
    message: 'should match exactly one schema in oneOf'
  }
]
mvahowe commented 4 years ago

@rdb This is develop. Using jsonschema directly gets me the same result. I'm using whatever comes with Ubuntu 19.10, which appears to have Python 2 as a dependency. (There's no version of jsonschema to be found anywhere, obviously.)

rdb commented 4 years ago

Are you using the python-jsonschema package from the Ubuntu repositories or from pip?

mvahowe commented 4 years ago

However, curiously, invoking the validation script as an argument to python2.7 fails with "no module found".

I'm using the Ubuntu repository module which something else seems to have installed for me.

mvahowe commented 4 years ago

I just sudo pip installed jsonschema, same mileage.

rdb commented 4 years ago

Can you try Python 3? Python 2 is EOL.

mvahowe commented 4 years ago

I'm using python 3. It doesn't work with python 2. That's why the Python 2 requirement is curious.

rdb commented 4 years ago

I reproduced the error using docker and the Ubuntu python3-jsonschema package. It turns out that Ubuntu ships an outdated version of jsonschema, which does not support draft-7 of the JSON Schema spec.

We require at least jsonschema 3.0.0, which can be installed using pip. We should document this.

mvahowe commented 4 years ago

@rdb also points out that the outdated package doesn't complain about being given an unsupported schema version, it just tries to wing it and fails.

mvahowe commented 4 years ago

@jag3773 @FoolRunning the information at https://json-schema.org/implementations.html looks useful and encouraging (ie there are allegedly implementations for draft 7 which is what we need).

jag3773 commented 4 years ago

Follow ups in #150 and #151