18F / omb-eregs

A tool to find, read, and maintain White House Office of Management and Budget (OMB) policy requirements
https://policy-beta.cio.gov/
Other
9 stars 9 forks source link

Schema validation #821

Closed yowill closed 6 years ago

cmc333333 commented 6 years ago

I think there are three separate validations of differing complexity:

  1. Ensuring input is actually JSON/XML/etc. - this should take place at the parser level and we won't need to write too much custom code
  2. Ensure that all required fields are present and of the right type - this is part of the de-serializer. I expect we'll be able to crib quite a bit off the model serializers here.
  3. Ensure the document follows the correct schema - this is the more interesting problem and will take more thought. This'd account for things like "ensure all sections have a title" or "tables should only have caption, thead, tbody children".

Aside from listing the components we haven't spent much time on the document schema (3). We might start with an XSD as a simple, document-oriented schema language; I'd argue it'd also be worth considering ProseMirror's schema structures and looking around for other standards we could share across the apps (we'll have a version of the schema in the editor, the api, and the ui, though probably encoded differently). I'd recommend against JSON Schema, as it's not document oriented. I'd also strongly encourage thinking about the policies as one type of document with the potential for more; it'd be forward thinking to lay a general foundation for schemas now.

cmc333333 commented 6 years ago

Thinking of those phases in terms of types, it’d be:

  1. Binary blob parsed into nested Python dicts (this takes place in DRF’s parsers)
  2. Python dict fields validated and converted into Django models (this takes place in DRF’s serializers)
  3. Django models are compared against a document schema and saved to the db (this would be a new thing)
cmc333333 commented 6 years ago

Attempting to split this into specific tasks:

Parser level

Serializer level

(Broken out here: https://github.com/18F/omb-eregs/issues/906)

Document Schema level

(Broken out here: https://github.com/18F/omb-eregs/issues/907)

tadhg-ohiggins commented 6 years ago

Closing because this was covered by #884 , #906 and # 907 (thanks @cmc333333 !)