esmero / ami

Archipelago Multi Importer. A module of mass ingest made for the masses
GNU Affero General Public License v3.0
2 stars 4 forks source link

Parsing XML to JSON on AMI Ingest #27

Open dmer opened 3 years ago

dmer commented 3 years ago

The webform allows one to upload an XML file that gets nicely parsed out into the SBF JSON using the webform field type: Import Metadata from a File

The need is to have this same parsing/processing done to an XML files when they are ingested via AMI.


pasting below the slack convo from Diego on this:

Diego: “I see all of the values from my xml very nicely parsed into JSON in the SBF.” makes me happy because XML to JSON is tricky and i had to make quite some acrobatics to generate a decent sized/parseable JSON from XML. But for your use case, no webform element level processing is done via AMI (as we speak). AMI is not even really aware of what webform you may/want to use (you could have many). Reason is because it is a bit tricky because a lot of what Webform does requires JS/Human interaction and AMI can not access JS at all (not client level, server level). Remember your webform module in islandora 7? A lot of mapping and XML forms in Islandora could not even process data/just read/write what you would put there. I think we can find some type of “plugin” level processing for AMI, where some (may need a list?) webform equivalents can be mapped to certain fields to make that happen. It would imply: a new AMI set mapper, some plugins that take input/generate output (and making them 1:1 with webform may be a challenge) and then use the output of the plugin in the Queue Worker to enrich the JSON.

Derek Merleaux : :thumbsup: yep I’ll do that now - it sounds like the XML to JSON processing that is being accessed by the webform is not currently accessible by AMI? That’s why a plugin is needed?

Diego Pino Yes. Webforms do a lot on their own realm that is outside of the Node Ingest workflow. AMI can not access that (now) because webforms require browser/user interaction we can not fake (easily) e.g in the past all the “file characterization” was done by us on webform, but that did not work for drush or AMI ingests so i moved it to an event subscriber. That could also be an option. E.g “always process attached XMLs into JSON” and that would be “general” not AMI specific. Remember you can also ingest objects via DRUSH or even the JSON-API directly But then some poeple may see XML as a preservation format that does not need to be JSON-i-fied! (edited) so… issue is, we have again too many choices. Another example. CSV to JSON. I should not “process” every CSV into JSON. May i want process some This may also be solved outside of AMI via SBRunners (edited) Where we have more control/decision making options And SBRunners apply to every object and can/be/forced/to regenerate without reingesting

DiegoPino commented 3 months ago

Since we have scripts (tested) now for MODS, Generic XML and EAD this will be added to 0.9.0 as a new File processor that will expand (no configs, it will work in "one way") XML to CSV/JSON.