support JSON documents for Records

ghukill commented 6 years ago

Until this point, Combine has assumed that Records would have XML documents as their primary payload. Combine was envisioned, initially, as a drop-in replacement for OAI-PMH aggregators like REPOX, so this made sense.

But as it evolves, it's become evident that supporting other types of document types for Records would be beneficial. This issue is proposing to add JSON, but the modifications would also lay the groundwork for other types like CSV (one row per Record, perhaps), raw text, etc.

Some major areas to focus on for allowing JSON documents for Records would include:

Records would need a record_type: e.g. XML, JSON, csv, etc.
PythonUDFRecord used throughout would need to support JSON, keying off record_type
Transformations and Validations would likely also need a Record type
- or could handle when firing, allow errors to bubble up
Exports would need to handle JSON, perhaps one Record per line, or chunked sets much like the XML
small things like the message "Valid XML" would need to be updated, generalizing to simply "Valid" (and how this is determined, which is currently just an attempt at lxml parsing)
JSON validations could be JSON schemas
need to think through what publishing JSON would look like
- certainly exports would be doable, but OAI likely not
- good reason / impetus for ResourceSync?
- custom API publishing?
should Record be subclassed to JSONRecord and XMLRecord?
- this would support adding other types later

Likely much more, but a high-level glance.

ghukill commented 6 years ago

Contd.

Field Mapper for JSON

antmoth commented 5 years ago

We definitely need to keep this in mind as we move through the future, because it's gonna need to happen eventually.

MI-DPLA / combine

support JSON documents for Records #246