delving / sip-creator

The Delving SIP-Creator
sip-creator.delving.eu
9 stars 3 forks source link

Change the MappingRunner interface to only accept arguments of types that are in the Java SDK #486

Closed hanswesterbeek closed 5 years ago

hanswesterbeek commented 7 years ago

Purpose:

Once this is in place, we have provided proof that we can ship processing-jobs over the wire. That opens up possibilities to create a webservice that specializes at executing them. Also, it makes us invulnerable to any versioning issues with loaded classes that may have a different definition at the time of creating a mapping-script compared to when it is actually run. As a side-effect, I expect all leaking of class-definitions into JDK MetaSpace to be solved.

Prerequisites:

  1. All scripts must inline any external dependencies
  2. Both current sip-app and narthex must use this new interface.

Tasks

Impact

  1. Every mapping will need to be re-generated. Luckily, this happens on the fly.
  2. Users must always match the Sip-version they use with the Narthex version in use. They must share the same sip-core.
kiivihal commented 7 years ago

In general, I agree with your plan for updating the MappingRunner interface. I have some important points that need to be taken into account when implementing this.

Input for the MappingRunner interface

The mappingprocesseror takes 5 external sources plus the source records to run the mapping:

Additionally the source data is avialable in a GZipped file. This source format is constructed in a pocket format by Narthex.

So the new MappingRunner Interface should be able to take these input sources and generate from them their internal models. I agree that it is better that these can/should be given as strings to the Interface instead of already initiated Classes.

Also for the development of a Commandline-interface (CLI), the new MappingRunner interface should be a good entry point.

Functions

There Groovy code in that is executed in the has access to functions that are defined on three levels:

If we are going to remove the ability to import classes that are on the Sip-Core classpath from functions, we must make these more complex things available as system functions. There is an example where we are using an external library to do conversion between various formats of geospatial encodings. Ideally, system functions should also be part of the list of user defined functions.

General remarks on versioning of mappings and record definitons

Each mapping is linked to a specific record definition with a specific version. All these versions are available on http://schemas.delving.org. In Hub2, the functionality to interact with the schema-repository was much more integrated. In Hub3, each sip-zip contains the right record-definition and validation XSD.

Also, note that the the facts in the mapping seem to be prioritized over the facts in the 'narthex_facts.txt' file. There is already an issue in Narthex that deals with the fact that these two sources can get out of sync, see https://github.com/delving/narthex/issues/136.

hanswesterbeek commented 7 years ago

Thanks for the reply. Much of what you wrote acknowledges what I have deducted from reading the code.

One question: I don't quite see what the record-definition has to do with the processing step. Can you explain? I can see how it would be involved in generating the mapping-script, but not in executing it.

For your reference, here's a recently generated copy of our [test mapping-script](https://gist.github.com/hanswesterbeek/bb26f79a2b9bed560e2aa957c772b0b5

As you can see, it imports some stuff. Once you wade through all the functions you can see is that all it really does is use Groovy's MarkupBuilder for creating XML and then return a org.w3c.dom.Node. It's not hard to always let that be a string.

Yes I am aware that the calling thread will propably have to parse that XML again and turn it into a org.w3c.dom.Document but the simplicity of the API will mean that you can use it (over the wire) from, say, a Python application such as Nave.