clarin-eric / VLO

Virtual Language Observatory
GNU General Public License v3.0
14 stars 6 forks source link

Consume dynamic facet maps #22

Closed twagoo closed 7 years ago

twagoo commented 7 years ago

A workflow (TODO: add link to relevant issue or other description) will be developed (primarily by ACDH) for maintaining facet maps based on manual edits by domain experts, synchronisation with CLAVAS vocabularies and potentially other mechanisms. In this workflow, mapping definitions will be stored across distributed (Git) repositories. This way, versions of the combined mapping definition can be pulled to various VLO instances and thus used by the importer. To connect the importer to the files that make up this definition, the local environment must have a version (clone/fork) of the mapping project available locally; the importer must just know (through the general VLO configuration) where to find the various files on local disk. This is not dissimilar to the current configuration schema, which has parameters for various individual mapping files. It would probably be better to refer to a single 'index' file that links (some) facets to (0-∞) maps.

Summary: the mapping definitions themselves will be maintained, versioned etc in a separate project, and act as a 'runtime dependency' to the VLO importer, which accesses the required files through the local filesystem.

twagoo commented 7 years ago

Discussed in some more detail in the recent VLO planning meeting. For a (generally accepted but not final) proposal for a workflow, see the draw.io diagram (image snapshot).

twagoo commented 7 years ago

I have boostrapped a the VLO-mapping project. In principle this could already be used by overriding the default locations in the facetConceptsFile and various mapping file locations in VloConfig.xml. I will experiment with this in a test setup.

twagoo commented 7 years ago

The PostProcessorsWithVocabularyMap class only can resolve bundled resources. A change to the logic of the getMappingFromFile(String mapUrl) function is required to also allow file:/... URLs.

twagoo commented 7 years ago

Test import on alpha shows that mappings are correctly read from a local VLO-mappings clone:

2017-03-13 09:25:50,286 INFO [eu.clarin.cmdi.vlo.importer.PostProcessorsWithVocabularyMap#getMappingFromFile:68] - Reading vocabulary file from: file:/srv/VLO-mapping/uniform-maps/LanguageNameVariantsMap.xml
2017-03-13 09:25:50,316 INFO [eu.clarin.cmdi.vlo.LanguageCodeUtils#createCodeMap:146] - Creating language code map from http://catalog.clarin.eu/ds/ComponentRegistry/rest/registry/1.x/components/clarin.eu:cr1:c_1271859438110/xml
2017-03-13 09:25:51,440 INFO [eu.clarin.cmdi.vlo.importer.PostProcessorsWithVocabularyMap#getMappingFromFile:68] - Reading vocabulary file from: file:/srv/VLO-mapping/uniform-maps/LicenseAvailabilityMap.xml
2017-03-13 09:25:51,477 INFO [eu.clarin.cmdi.vlo.importer.PostProcessorsWithVocabularyMap#getMappingFromFile:68] - Reading vocabulary file from: file:/srv/VLO-mapping/uniform-maps/LicenseURIMap.xml
2017-03-13 09:25:51,503 INFO [eu.clarin.cmdi.vlo.importer.PostProcessorsWithVocabularyMap#getMappingFromFile:68] - Reading vocabulary file from: file:/srv/VLO-mapping/uniform-maps/OrganisationControlledVocabulary.xml
2017-03-13 09:25:51,580 INFO [eu.clarin.cmdi.vlo.importer.PostProcessorsWithVocabularyMap#getMappingFromFile:68] - Reading vocabulary file from: file:/srv/VLO-mapping/uniform-maps/nationalProjectsMapping.xml
2017-03-13 09:28:20,709 INFO [eu.clarin.cmdi.vlo.importer.MetadataImporter#main:795] - Could not get config file name via the command line, trying the system properties.
2017-03-13 09:28:20,712 INFO [eu.clarin.cmdi.vlo.importer.MetadataImporter#main:818] - Reading configuration from file:/srv/webapps/vlo/vlo-4.1-SNAPSHOT/bin/./../config/VloConfig.xml
2017-03-13 09:28:20,882 INFO [eu.clarin.cmdi.vlo.importer.MetadataImporter#initSolrServer:358] - Initializing concurrent Solr Server on http://localhost:8080/solr/core0/ with 2 threads
2017-03-13 09:28:21,103 INFO [eu.clarin.cmdi.vlo.importer.MetadataImporter#startImport:151] - Start of processing: Europeana
2017-03-13 09:28:21,126 INFO [eu.clarin.cmdi.vlo.importer.MetadataImporter#startImport:16