The CESSDA Data Catalogue (CDC) can harvest any XML content provided by an OAI-PMH endpoint. It uses different sets of XPath mappings to adapt the different flavours of the XML payloads to a standard format, namely the CESSDA Metadata Model.
The CESSDA Metadata Validator (CMV) is part of the pipeline, and is used to perform bulk checks on the harvested files. Additional checks are also run (XML Schema validation on DDI 2.5 metadata files) and the validated files are saved to a Google Cloud storage bucket. Note that files are validated in the following sequence: XML Schema; CMV.
The results of the validation checks are sent to an ElasticSearch index that feeds a Kibana dashboard. The dashboard shows both summary and detailed information regarding violations (non-conformance with the CDC DDI profiles) and XML Schema validations. The Aggregator component loads the validated files into its storage component and makes them available for aggregators such as OpenAIRE, B2Find and GoTriple to harvest.
The CDC product is made up of several components, which can be grouped as Data Gathering, User Facing, Public API and Management. There are also some repositories which are concerned with Documentation & Issue Tracking and QA & Deployment respectively.
The following Open Source code repositories are used to gather and index metadata:
The following Open Source code repository is used to provide the user facing components:
The following components are part of the Aggregator (an OAI-PMH endpoint for the CDC):
The following private source code repositories are used to build and deploy the management components:
The following public source code repository applies validation to the harvested metadata records:
The following private source code repositories are used to build the documentation components:
The following private source code repositories are used to test and deploy the product's components:
See CDC Developer documentation for details.
See CDC Operations documentation for details.
See CDC User guide for details.
The Jenkinsfile in each of the Data Gathering and User Facing component repositories defines the build pipeline for that component. See also the 'README.md' file in each of those repositories.
See the 'QA and Deployment' section, above.
See the 'QA and Deployment' section, above.
The Jenkinsfile in each of the Data Gathering and User Facing component repositories defines the build pipeline for that component. See also the 'README.md' file in each of those repositories.
Please read the CESSDA Software Development Guidelines for details on our code of conduct, and the process for submitting pull requests to us.
See Semantic Versioning for guidance.
You can find the list of contributors in the CONTRIBUTORS.md
file for each component repository.
See the LICENSE file for each component repository.
See the FAQ file.
None at present.