Until this point, Combine has assumed that Records would have XML documents as their primary payload. Combine was envisioned, initially, as a drop-in replacement for OAI-PMH aggregators like REPOX, so this made sense.
But as it evolves, it's become evident that supporting other types of document types for Records would be beneficial. This issue is proposing to add JSON, but the modifications would also lay the groundwork for other types like CSV (one row per Record, perhaps), raw text, etc.
Some major areas to focus on for allowing JSON documents for Records would include:
Records would need a record_type: e.g. XML, JSON, csv, etc.
PythonUDFRecord used throughout would need to support JSON, keying off record_type
Transformations and Validations would likely also need a Record type
or could handle when firing, allow errors to bubble up
Exports would need to handle JSON, perhaps one Record per line, or chunked sets much like the XML
small things like the message "Valid XML" would need to be updated, generalizing to simply "Valid" (and how this is determined, which is currently just an attempt at lxml parsing)
JSON validations could be JSON schemas
need to think through what publishing JSON would look like
certainly exports would be doable, but OAI likely not
Until this point, Combine has assumed that Records would have XML documents as their primary payload. Combine was envisioned, initially, as a drop-in replacement for OAI-PMH aggregators like REPOX, so this made sense.
But as it evolves, it's become evident that supporting other types of document types for Records would be beneficial. This issue is proposing to add JSON, but the modifications would also lay the groundwork for other types like CSV (one row per Record, perhaps), raw text, etc.
Some major areas to focus on for allowing JSON documents for Records would include:
record_type
: e.g.XML
,JSON
,csv
, etc.record_type
Record
be subclassed toJSONRecord
andXMLRecord
?Likely much more, but a high-level glance.