Meta-Issue: FITS Derivatives

dannylamb commented 6 years ago

This is a meta-issue to track the development of FITS derivative generation functionality. Please refer to this issue in any subsequent issue in order to link them.

Currently, we have removed code from claw-playbook to install a FITS server, mostly because of time out issues when we weren't even prepared to use it yet. Now that we're ready to use it, we need to

Add the FITS web service back into claw-playbook
Emit derivative events from Drupal to a queue
Read from the queue using camel, issue a request to the FITS microservice, and store the results in the repository

Natkeeran commented 6 years ago

@dannylamb

Need some clarification. "store the results in the repository". Does this mean storing it in Drupal as a media or file?

dannylamb commented 6 years ago

@Natkeeran Ideally we'd have a way to extract bits from the FITS XML on the way in and apply that to the original Media as fields. But i'm not sure what the best approach would be on that. My intention in the issue description was just to create a file, create a media for it, and associate that with the node. That's at least comparable to 7.x until we figure out a way to generically handle all the fields and format specific metadata FITS can churn out.

ajs6f commented 6 years ago

@dannylamb A "cheap and cheerful" way might be to use an XSLT transform to go from FITS XML to RDF/XML (I know, I hate it too) and thence to properties. I'd be happy to write that XSLT to order if it turns out to be useful.

Natkeeran commented 6 years ago

@dannylamb

Where should we handle the logic to issue the request and storing the results should be handled.

The FITS webservice itself is out of the box service. Seems like we may need a micro service. Or, we can possibly handle the logic in Alpaca.

dannylamb commented 6 years ago

@Natkeeran It's a judgement call. You can make another microservice to do the logic in PHP, or you can deal with it in Java/Camel. Making a request to the FITS service and poking Drupal with the results is easy/stable with Java and Camel. That's how I'd approach this to start. If it starts getting into lots of dirty array processing / string manipulation, etc... then I'd start considering PHP.

dannylamb commented 6 years ago

@ajs6f I shudder at the thought of xslts in the stack. Let's hope it doesn't come to that, but if it does, I'll hunt you down to make good on that promise :imp:

ajs6f commented 6 years ago

@dannylamb Fair enough, I hear you. 😁 There are plenty of other good ways to do it. It might be better to turn the task around-- i.e. to figure out what the extraction is abstractly (which properties), then decide on what would be a good way to impl that. I guess that (which properties to extract) is a conversation that should involve lots of good advice from the metadata squad.

jonathangreen commented 6 years ago

Maybe we could contribute back to the FITS tool a JSON output? Not sure how open they would be to something like that.

ajs6f commented 6 years ago

+1 to @jonathangreen . That's the strongest solution and it should be tried first. (Even better, a JSON-LD output!)

dannylamb commented 6 years ago

Maybe we could run http://camel.apache.org/xmljson.html on some FITS xml and see what pops out.

DiegoPino commented 6 years ago

In PHP its 2 lines. Includes the xml properties, nothing lost.

https://gist.github.com/DiegoPino/4bb61af523bc9639698938f1e69112b3

(if you remove the XML itself and the final Prints()) its two lines of course

Feel free to run directly to test the output.

adding a @context is just a matter of flattening, collecting the keys.. adding the URLs (well, that means a new ontology ..), array combine = JSON-ld and compact. done.

ajs6f commented 6 years ago

Going to JSON and thence to JSON-LD sounds great. I don't think the selection of JSON-LD context is entirely trivial, but that could be a different ticket.

dannylamb commented 6 years ago

In PHP its 2 lines.

^^ or that

rosiel commented 5 years ago

In reference to Danny's comment of

@Natkeeran Ideally we'd have a way to extract bits from the FITS XML on the way in and apply that to the original Media as fields. But i'm not sure what the best approach would be on that. My intention in the issue description was just to create a file, create a media for it, and associate that with the node. That's at least comparable to 7.x until we figure out a way to generically handle all the fields and format specific metadata FITS can churn out.

Right now we're leaning to the route of save the XML as a Media attached to the node. This is because we don't know ahead of time what "fields" FITS will output, and each would have to be installed in Drupal ahead of time, and configured to display, etc. But since nobody except :nerds: like to read XML, we're considering using the same XSLT that turned it into field:value pairs for display in an HTML table that was in use in 7.

Islandora / documentation

Meta-Issue: FITS Derivatives #934