cessda / cessda.cdc.versions

Issue track and wiki for the CESSDA Data Catalogue
Apache License 2.0
0 stars 0 forks source link

CESSDA Data Catalogue

The CESSDA Data Catalogue (CDC) can harvest any XML content provided by an OAI-PMH endpoint. It uses different sets of XPath mappings to adapt the different flavours of the XML payloads to a standard format, namely the CESSDA Metadata Model.

The CESSDA Metadata Validator (CMV) is part of the pipeline, and is used to perform bulk checks on the harvested files. Additional checks are also run (XML Schema validation on DDI 2.5 metadata files) and the validated files are saved to a Google Cloud storage bucket. Note that files are validated in the following sequence: XML Schema; CMV.

The results of the validation checks are sent to an ElasticSearch index that feeds a Kibana dashboard. The dashboard shows both summary and detailed information regarding violations (non-conformance with the CDC DDI profiles) and XML Schema validations. The Aggregator component loads the validated files into its storage component and makes them available for aggregators such as OpenAIRE, B2Find and GoTriple to harvest.

Project Structure

The CDC product is made up of several components, which can be grouped as Data Gathering, User Facing, Public API and Management. There are also some repositories which are concerned with Documentation & Issue Tracking and QA & Deployment respectively.

Data Gathering components

The following Open Source code repositories are used to gather and index metadata:

User Facing components

The following Open Source code repository is used to provide the user facing components:

Public API components

The following components are part of the Aggregator (an OAI-PMH endpoint for the CDC):

Management components

The following private source code repositories are used to build and deploy the management components:

The following public source code repository applies validation to the harvested metadata records:

Documentation & Issue Tracking

The following private source code repositories are used to build the documentation components:

QA & Deployment

The following private source code repositories are used to test and deploy the product's components:

Developer documentation

See CDC Developer documentation for details.

Operations documentation

See CDC Operations documentation for details.

User documentation

See CDC User guide for details.


The Jenkinsfile in each of the Data Gathering and User Facing component repositories defines the build pipeline for that component. See also the 'README.md' file in each of those repositories.

Running the tests

See the 'QA and Deployment' section, above.


See the 'QA and Deployment' section, above.

Built With

The Jenkinsfile in each of the Data Gathering and User Facing component repositories defines the build pipeline for that component. See also the 'README.md' file in each of those repositories.


Please read the CESSDA Software Development Guidelines for details on our code of conduct, and the process for submitting pull requests to us.


See Semantic Versioning for guidance.


You can find the list of contributors in the CONTRIBUTORS.md file for each component repository.


See the LICENSE file for each component repository.


See the FAQ file.


None at present.