cessda / cessda.cdc.versions

Issue track and wiki for the CESSDA Data Catalogue
Apache License 2.0
0 stars 0 forks source link

Add Hummingbird endpoint #649

Open john-shepherdson opened 5 months ago

john-shepherdson commented 5 months ago


matthew-morris-cessda commented 5 months ago

What friendly name should be used?

john-shepherdson commented 4 months ago

I cannot see HMB as a Publisher in staging (English or Greek). I am guessing that the (currently) 10 records in the HumMingBird_Data set are also included in the main catalogue and are therefore discarded as duplicates. I will check with EKKE.

john-shepherdson commented 4 months ago

From: John Shepherdson Date: Wed, 8 May 2024 at 17:09 Subject: Re: [Hummingbird] Follow-up from CESSDA team call on 22 January 2024 To: Nikos Klironomos Cc: Dimitra Kondyli, Panagiota Starida


Thanks for this. We have attempted to harvest these datasets and tag them as being published by Hummingbird, but they don't show up, as they are regarded as duplicates of the records in your main catalogue. The problem is that the record identifiers are based on the canonical OAI-PMH URL of the endpoint and OAI-PMH record identifier (see https://datacatalogue.cessda.eu/documentation/oai-pmh-identifiers.html), which are the same in this case, regardless of the set spec.

So the only way for this to work is to either:

I suspect the first option is preferable, as it means other clients can choose to either harvest all your records, or one/both subsets from the same endpoint.



john-shepherdson commented 3 months ago

From: Nikos Klironomos Date: Wed, 8 May 2024 at 17:36 Subject: Re: [Hummingbird] Follow-up from CESSDA team call on 22 January 2024 To: John Shepherdson Cc: Dimitra Kondyli, Panagiota Starida

Dear John,

Thank you very much for your reply.

Yes, it seems that the first solution is preferable for me too.

I will work it out and we will be in touch.



john-shepherdson commented 2 months ago

Dear John,

I hope this message finds you well.

I am writing to inform you about the current status of our OAI-PMH endpoints and to seek clarification on the management of the HumMingBird data within the CDC.

We now have two OAI-PMH endpoints available:

An endpoint dedicated exclusively to the HumMingBird data. ( https://datacatalogue.sodanet.gr/oai?verb=ListIdentifiers&set=SoDaNet_Minus_HumMingBird_Data&metadataPrefix=oai_ddi ) An endpoint that encompasses the sum of the data projects in our repository, excluding the HumMingBird data. ( https://datacatalogue.sodanet.gr/oai?verb=ListIdentifiers&set=HumMingBird_Data&metadataPrefix=oai_ddi ) However, we have encountered a challenge with the second endpoint. We are currently unsure if it will automatically refresh when a new data project is published. Unfortunately, we cannot test this functionality at the moment due to the absence of any new data projects to publish.

Additionally, we would like to understand how the HumMingBird data will be managed within the CDC. Specifically:

Will the HumMingBird data be available as a separate collection apart from the main catalogue? How will these data be visible to end-users? At present, the HumMingBird data are accessible through the main CDC. Will this remain the same, or are there plans for any changes? As we are approaching the deadline for the D11.5 deliverable, it is essential that we inform the HumMingBird team about these details. Your insights on the above matters will be invaluable for our report.

Thank you very much for your assistance.

Best regards,


john-shepherdson commented 2 months ago

Dear all,

Correct repetition

An endpoint that encompasses the sum of the data projects in our repository, excluding the HumMingBird data. ( https://datacatalogue.sodanet.gr/oai?verb=ListIdentifiers&set=SoDaNet_Minus_HumMingBird_Data&metadataPrefix=oai_ddi ) An endpoint dedicated exclusively to the HumMingBird data. ( https://datacatalogue.sodanet.gr/oai?verb=ListIdentifiers&set=HumMingBird_Data&metadataPrefix=oai_ddi ) Best, Kostas

john-shepherdson commented 2 months ago


Thanks for getting in touch with the latest details. I have CCd Kristina (the CDC Service Owner) as she needs to be in the loop regarding what happens next. For testing purposes, we have added a new endpoint for the HumMingBird data in the staging environment and the HumMingBird metadata appears among with all the other metadata and can be filtered so only the HumMingBird data is shown, like this: https://datacatalogue-staging.cessda.eu/?publisher.publisher[0]=HumMingBird

If Kristina is in agreement, this could be done in the production environment too then you would be able to use this URL https://datacatalogue.cessda.eu/?publisher.publisher[0]=HumMingBird in D11.5 to show that the metadata is published and available to researchers.

(We would also need to update the SoDaNet endpoint specification to exclude the HumMingBird metadata.)

In the medium term, we are planning to turn CDC into a thematic portal, so that various data collections have their own URL and their own look and feel (though similar to the current CDC one). Users would be able to easily switch between the themes to explore different collections. HumMingBird could be one of them.

I hope this helps.



john-shepherdson commented 2 months ago


Are you OK with this?



john-shepherdson commented 2 months ago


Yes, I think this is ok!


john-shepherdson commented 2 months ago

@matthew-morris-cessda can you push this change to production soon? Thanks