Open john-shepherdson opened 5 months ago
We could use the existing sets:
https://ssh.datastations.nl/oai?verb=ListRecords&metadataPrefix=oai_dc&set=CESSDA-EN https://ssh.datastations.nl/oai?verb=ListRecords&metadataPrefix=oai_dc&set=CESSDA-NL
but would have to treat them as 2 different endpoints with different names and different default languages. Might be confusing for the users to see publishes called (for example) 'DANS-KNAW (English)' and 'DANS-KNAW (Dutch)' - also the names would not comply with the Publisher names CV (https://vocabularies.cessda.eu/vocabulary/CdcPublisherNames?lang=en)
Ricarda Braukmann wrote: "Thanks @john-shepherdson for looking into it. For us I would prefer to be included in some way so as soon as possible so if what you say is possible that would be great. Alternatively, you could also for now harvest the English records only as those will be most relevant for CDC users I believe and that set is also our bigger set from the two.
Of course we want to be included full as soon as possible so it would be great if we can discuss how that can be achieved.
Can you specify what we need to do in order to be harvested through our regular endpoint?
We have language of metadata information in our metadata in a custom block so the information is available for most datasets. I am not sure how you harvest the Dataverse instances (i.e. what metadata schema do you use), and what adjustments we would need to make to comply with the requirements? I am happy to connect you with our technical team as well as they know better how things are currently implemented."
@KristinaS4 @MortenSikt Your thoughts please.
Added quotes from @RicardaBraukmann in the issue description
@john-shepherdson Do you know the status on the language tags for their Dataverse endpoint? It seems as they are willing to adjust their exports to comply with CDC's requirements. This would of course be the optimal solution. Can we do more to support this?
I agree that it will be confusing for users if there are two publishers called DANS-KNAW and in that case I think we should only include the English records as suggested by Ricarda to limit it to one publisher.
I am not aware that the language tags have been added (but maybe Matthew could have a look at some recently harvested DANS XML files to confirm) in which case the short term fix is only to include the records from the English endpoint.
Dear all, At DANS we do not have the language tags added to our DDI export via OAI-PMH. There is a Dataverse setting for it, but we have not enabled it. I will discuss with Ricarda what is the best way to solve the issue.
Dear @LauraHuisintveld @RicardaBraukmann
Following up on the e-mail exchange.. CESSDA Metadata Validator is available at: https://cmv.cessda.eu/#!validation
Please check it out for validating your outputs and get back to us when successful or if you have any issues in the meanwhile.
Please check the following example for a valid language specification.
Language can be specified at the document level within the element, or at the individual element level within a specific tag.
More examples can be found here
@alen-vodopijevec-cessda Yes, we will test our output with the validator, a very useful tool. One question, should we test with DDI Profile 2.0, or with 1.04 which is also still available within the validator?
You should use 1.04, passing this validation will guarantee the compliance with the CDC. You can also give it a try and test with 2.0 profile - just curious about the results as this is more strict.
Thanks, we can let you know our results once we are ready.
I already played around with the validator a bit, and I have another question. Sometimes our users have used html-code within the
The Data Catalogue will accept these records
Dear all, We have tested our new OAI-PMH output, and we think it is ready now. Could you please test this link and let us know if it works without problems? https://oai-service.labs.dans.knaw.nl/ss/oai?verb=ListRecords&set=social_sciences&metadataPrefix=oai_ddi
I can see in staging that 6749 records are found from DANS-KNAW. Some metadata is presented in the results list, but clicking on any record does not lead to a valid page (its just blank for me).
Not sure if this is before or after configuration @matthew-morris-cessda ?
I've only just implemented this configuration change so this isn't available yet. I'll update the issue when it is.
@matthew-morris-cessda I was wondering if there is any news? Are there still some problems we need to solve at our side?
@LauraHuisintveld Dutch language content is still tagged as English on this endpoint
@matthew-morris-cessda Hmmm, I can see it too now. Maybe something went wrong, I will ask my colleague to take a look and will let you know when to try again.
Hi @matthew-morris-cessda We have found the problem and did a new deploy. Could you please try again? The URL remains the same: https://oai-service.labs.dans.knaw.nl/ss/oai?verb=ListRecords&set=social_sciences&metadataPrefix=oai_ddi
The problem still persists
Originally posted by @RicardaBraukmann in https://github.com/cessda/cessda.cdc.versions/issues/662#issuecomment-2180919866
Originally posted by @john-shepherdson in https://github.com/cessda/cessda.cdc.versions/issues/662#issuecomment-2181069028
Originally posted by @RicardaBraukmann in https://github.com/cessda/cessda.cdc.versions/issues/662#issuecomment-2185835141 See also #662