cessda / cessda.cdc.aggregator.shared-library

Python library containing shared code for the CESSDA CDC Aggregator
European Union Public License 1.2
0 stars 0 forks source link

Alter schema to get easier access to direct provenance #39

Closed toni-sissala closed 10 months ago

toni-sissala commented 10 months ago

OAI-PMH endpoint /metrics endpoint is counting metrics using the direct provenance of records. It is a slow query in mongodb since it needs to use $elemMatch query operator. Alter the schema so, that the direct provenace object is more easily accessible.

old schema:

{
  "_provenance": [
    {"base_url": "some.url" ...}, 
    {"base_url": "another.url" ...}, ...]
}

new schema:

{
  "_provenance": [
    {"base_url": "some.url" ...}, 
    {"base_url": "another.url" ...}, ...,],
  "_direct_provenance": {
    "base_url": "some.url" ...}
}

Schema alterations must be taken into account in mapping also.

toni-sissala commented 10 months ago

Actually only the base_url of the direct provenance is required. The new schema can look like:

{
  "_provenance": [
    {"base_url": "some.url" ...}, 
    {"base_url": "another.url" ...}, ...,],
  "_direct_base_url": "some.url"
}