clarin-eric / VLO

Virtual Language Observatory
GNU General Public License v3.0
14 stars 6 forks source link

Data provider (OAI endpoint) field #131

Closed twagoo closed 6 years ago

twagoo commented 6 years ago

Incorporate information provided by the OAI harvester to store the OAI endpoint URI for each imported record. This has to be exposed by the harvester in some format that contains all necessary information to link a record to its provider, e.g. via the directory structure.

My proposal is to create two fields:

Note: dataProvider already exists as a field, and is based on the harvest set (the "data source" facet uses this field)

twagoo commented 6 years ago

nationalProject can in some (many/all?) cases be derived from the endpoint. This would be a case from cross-facet mapping (see #93) as would the mapping to dataProviderName.

twagoo commented 6 years ago

JSON API of centre registry should provide more information, check if this includes the national project per endpoint

menzowindhouwer commented 6 years ago

https://github.com/clarin-eric/Centre-Registry/tree/master/centre-registry-app#rest-api

Looks like JSON dumps of the database tables:

endpoint to https://centres.clarin.eu/api/model/OAIPMHEndpoint centre to https://centres.clarin.eu/api/model/Centre consortium to https://centres.clarin.eu/api/model/Consortium

I think that should give enough info to generate the mapping.

teckart commented 6 years ago

I guess the communication with the centre registry is not related to the vlo-importer code, but happens via a general procedure for generating a mapping file for CFM in the VLO-mapping project?

menzowindhouwer commented 6 years ago

Yes, we should generate a CFM from this JS responses. But it might also be needed for the harvester to switch to this JS API as the XML API is apparently deprecated. In that case the harvester could produce a complete CFM, i.e., from directoryName to endpointUrl and nationalProject.

teckart commented 6 years ago

Branch issue_131 (last commit: 9730b34f) uses the harvester mapping file and removes the former approach to populate field "nationalProject" via //MdCollectionDisplayName and "nationalProjectsMapping.xml".

For productive use the external facet definition file has to be adapted (i.e. removal of facet definition for "nationalProject"). The mapping file "nationalProjectsMapping.xml" can be removed, if it's not used in other projects.