clarin-eric / VLO

Virtual Language Observatory
GNU General Public License v3.0
14 stars 6 forks source link

Implement new value mapping strategy #93

Closed twagoo closed 6 years ago

twagoo commented 7 years ago

also see comments @ https://github.com/clarin-eric/VLO/commit/cd15ac153f213e44e0b9a951ae59c159209d6b5e#commitcomment-25076921

wowasa commented 6 years ago

the feature is implemented in the development branch with some testcases. It the moment we can define a value for the origin facet. If there's a complete match (string comparison), one or many defined values are set to cross facets (which might be the same as the origin facet). In the long run it might be more convenient to replace the string comparison by the ability to use regular expressions. Important: if the field-value matches the defined value this value is not processed for the origin facet anymore (unless the facet is defined as cross facet).

twagoo commented 6 years ago

I noticed by testing the importer with cross facet mapping enabled, that if you define a cross facet mapping for a facet (i.e. set an 'origin facet'), that facet is no longer processed itself. So for example if I define any mapping from a collection value to an availability value (see #46), the entire collection facet effectively disappears. I'm not sure if this is how we envisioned the behaviour, but in any case looking at it now don't think this is desirable. I think cross facet mapping should in principle leave the original value untouched, i.e. cross facet mapping should only cause additional values. For cases where the original value should not be visible, we should either use 'hidden facets' or explicitly define this to happen e.g. through a special attribute in the cross facet mapping definition .

wowasa commented 6 years ago

is this a new requirement or are we going to discuss it? Because I understood CFM in a way that it should in any case ignore the setting for the origin-facet if the value matches the condition

twagoo commented 6 years ago

Perhaps this was underspecified then. Is there a written version of the requirements for CFM somewhere?

twagoo commented 6 years ago

Mapping paramter combinations - use cases (draft, PDF)

Update: there's an asciidoc version of this available on GitHub now: https://github.com/clarin-eric/VLO-mapping/blob/development/doc/ValueMapping.adoc

twagoo commented 6 years ago

Potential extensions of the design/implementation to consider (see the document Potential for reimplementation of VLO post-processors as value mapping cases):

twagoo commented 6 years ago

FYI I have added some logging to the processing of the mapping and its application during the actual import. An example of what this looks like in actual import logs at INFO level:

2018-04-04 13:30:01,357  INFO [  Importer main] [eu.clarin.cmdi.vlo.importer.mapping.ValueMappingFactoryDOMImpl#getValueMappings:41] - Parsing value mapping in file:/srv/VLO-mapping/value-maps/dist/master.xml
2018-04-04 13:30:01,423  INFO [  Importer main] [eu.clarin.cmdi.vlo.importer.mapping.ValueMappingFactoryDOMImpl#getValueMappings:47] - Found 2 origin-facet nodes
2018-04-04 13:30:01,425  INFO [  Importer main] [eu.clarin.cmdi.vlo.importer.mapping.ValueMappingFactoryDOMImpl#processOriginFacet:70] - Processing origin-facet node with name='_componentProfile'
2018-04-04 13:30:01,426  INFO [  Importer main] [eu.clarin.cmdi.vlo.importer.mapping.ValueMappingFactoryDOMImpl#processValueMap:88] - -- Processing value-map node
2018-04-04 13:30:01,427  INFO [  Importer main] [eu.clarin.cmdi.vlo.importer.mapping.ValueMappingFactoryDOMImpl#processValueMap:105] - -- Found 50 target-value-set nodes
2018-04-04 13:30:01,451  INFO [  Importer main] [eu.clarin.cmdi.vlo.importer.mapping.ValueMappingFactoryDOMImpl#processOriginFacet:70] - Processing origin-facet node with name='resourceClass'
2018-04-04 13:30:01,451  INFO [  Importer main] [eu.clarin.cmdi.vlo.importer.mapping.ValueMappingFactoryDOMImpl#processValueMap:88] - -- Processing value-map node
2018-04-04 13:30:01,452  INFO [  Importer main] [eu.clarin.cmdi.vlo.importer.mapping.ValueMappingFactoryDOMImpl#processValueMap:105] - -- Found 115 target-value-set nodes

all other related newly added logging takes place at DEBUG or TRACE level, for example:

2018-04-04 14:25:15,490 DEBUG [Pool-1-worker-2] [CMDIParserVTDXML#processValueMapping:461] - Value mapping: applying mapping [_componentProfile: 'SourceScan'] -> [resourceClass: 'image'] (override existing: false)
twagoo commented 6 years ago

Tests on alpha show that value mapping works as intended. Currently, the development branch of clarin-eric/VLO-mapping has maps and scripts supporting this. This will soon be merged into beta, then master and hopefully also into the acdh-oeaw fork.