Closed twagoo closed 6 years ago
nationalProject
can in some (many/all?) cases be derived from the endpoint. This would be a case from cross-facet mapping (see #93) as would the mapping to dataProviderName
.
JSON API of centre registry should provide more information, check if this includes the national project per endpoint
https://github.com/clarin-eric/Centre-Registry/tree/master/centre-registry-app#rest-api
Looks like JSON dumps of the database tables:
endpoint
to https://centres.clarin.eu/api/model/OAIPMHEndpoint
centre
to https://centres.clarin.eu/api/model/Centre
consortium
to https://centres.clarin.eu/api/model/Consortium
I think that should give enough info to generate the mapping.
I guess the communication with the centre registry is not related to the vlo-importer code, but happens via a general procedure for generating a mapping file for CFM in the VLO-mapping project?
Yes, we should generate a CFM from this JS responses. But it might also be needed for the harvester to switch to this JS API as the XML API is apparently deprecated. In that case the harvester could produce a complete CFM, i.e., from directoryName to endpointUrl and nationalProject.
Branch issue_131 (last commit: 9730b34f) uses the harvester mapping file and removes the former approach to populate field "nationalProject" via //MdCollectionDisplayName and "nationalProjectsMapping.xml".
For productive use the external facet definition file has to be adapted (i.e. removal of facet definition for "nationalProject"). The mapping file "nationalProjectsMapping.xml" can be removed, if it's not used in other projects.
Incorporate information provided by the OAI harvester to store the OAI endpoint URI for each imported record. This has to be exposed by the harvester in some format that contains all necessary information to link a record to its provider, e.g. via the directory structure.
My proposal is to create two fields:
_oaiEndpointURI
the raw URIdataProviderName
Note:
dataProvider
already exists as a field, and is based on the harvest set (the "data source" facet uses this field)