Closed hanswesterbeek closed 5 years ago
In general, I agree with your plan for updating the MappingRunner interface. I have some important points that need to be taken into account when implementing this.
The mappingprocesseror takes 5 external sources plus the source records to run the mapping:
Mapping file = This file contains all the information provided by the user to map from the source to the target format (as specified by the record definition - for more information on this see below). The file in written in XML and contains the following information blocks:
<string></string>
entry. hints file = is a key-value TXT file that contains meta information how the source data should be interpreted.
narthex_facts.txt = is a key-value TXT file that contains the meta-information about the dataset that is managed via the narthex dataset forms and also so meta information about the narthex deployment such as: rdfBaseUrl, orgId, etc
record-definition = This XML file describes all the components the sip-creator needs to build the target format that can be mapped to. It contains the following main elements:
record definition validation XSD = This XSD validates each output record if it adheres to the rules. This validation in run both in the Sip-Creator and Narthex during processing. Both the sip-creator and Narthex have configuration options that can disable validition.
Additionally the source data is avialable in a GZipped file. This source format is constructed in a pocket format by Narthex.
So the new MappingRunner Interface should be able to take these input sources and generate from them their internal models. I agree that it is better that these can/should be given as strings to the Interface instead of already initiated Classes.
Also for the development of a Commandline-interface (CLI), the new MappingRunner interface should be a good entry point.
There Groovy code in that is executed in the has access to functions that are defined on three levels:
If we are going to remove the ability to import classes that are on the Sip-Core classpath from functions, we must make these more complex things available as system functions. There is an example where we are using an external library to do conversion between various formats of geospatial encodings. Ideally, system functions should also be part of the list of user defined functions.
Each mapping is linked to a specific record definition with a specific version. All these versions are available on http://schemas.delving.org. In Hub2, the functionality to interact with the schema-repository was much more integrated. In Hub3, each sip-zip contains the right record-definition and validation XSD.
Also, note that the the facts in the mapping seem to be prioritized over the facts in the 'narthex_facts.txt' file. There is already an issue in Narthex that deals with the fact that these two sources can get out of sync, see https://github.com/delving/narthex/issues/136.
Thanks for the reply. Much of what you wrote acknowledges what I have deducted from reading the code.
Map<String, String>
.CodeGenerator
class, which is in desperate need of better test coverage and, as a result, some refactoring.BulkMappingRunner
and a AppMappingRunner
which both implement MappingRunner
. What will be tricky is that our new SimpleMappingRunner
will have a more modest contract: it won't be providing feedback on non-compiling code such as AppMappingRunner
does. So for sip-app, we'll have to write a MappingCompilationReporter
of some sort.One question: I don't quite see what the record-definition has to do with the processing step. Can you explain? I can see how it would be involved in generating the mapping-script, but not in executing it.
For your reference, here's a recently generated copy of our [test mapping-script](https://gist.github.com/hanswesterbeek/bb26f79a2b9bed560e2aa957c772b0b5
As you can see, it imports some stuff. Once you wade through all the functions you can see is that all it really does is use Groovy's MarkupBuilder
for creating XML and then return a org.w3c.dom.Node
.
It's not hard to always let that be a string.
Yes I am aware that the calling thread will propably have to parse that XML again and turn it into a org.w3c.dom.Document but the simplicity of the API will mean that you can use it (over the wire) from, say, a Python application such as Nave.
Purpose:
Once this is in place, we have provided proof that we can ship processing-jobs over the wire. That opens up possibilities to create a webservice that specializes at executing them. Also, it makes us invulnerable to any versioning issues with loaded classes that may have a different definition at the time of creating a mapping-script compared to when it is actually run. As a side-effect, I expect all leaking of class-definitions into JDK MetaSpace to be solved.
Prerequisites:
Tasks
import
anything not shipped with the JRE.MappingCategory.groovy
runMapping
API must be refactored to:Impact
sip-core
.