IQSS / dataverse

Open source research data repository software
http://dataverse.org
Other
882 stars 493 forks source link

Add import for oai_datacite ("OpenAire") format (this will allow Dataverse to harvest this format too) #7727

Open tjouneau opened 3 years ago

tjouneau commented 3 years ago

Hi Feature : offer compatibility/possibility to harvest in Datacite format(s). Currently a list is displayed and only oai_dc seems to work. image

This is a follow-up to an out-of-topic discussion with @landreev which happened in issue #7638 and which I'm reproducing below.

As for the other formats, I don't really know what "oai_datacite4" is. It is safe to say that of all the (10?) formats they are offering (https://www.zenodo.org/oai2d?verb=ListMetadataFormats) oai_dc is the only one Dataverse understands. I am surprised that we are allowing a user to select an unsupported metadata format in the Harvesting Client config. (I thought we were dropping any unsupported formats from the list). Looking at an example of oai_datacite4 (https://www.zenodo.org/oai2d?verb=GetRecord&identifier=oai:zenodo.org:204063&metadataPrefix=oai_datacite4), it appears to be simple enough. So it should be very doable to add support for it. But yes, that would definitely need to be handled in a separate issue.

Relevant use cases : I think it is a use case relevant to the ongoing discussion happening in the MD WG about DV as registry (cf. MIT)

Possible problem :

Thanks in advance for your time. Thomas

landreev commented 3 years ago

This could be a useful/popular format to support, I agree. Let's investigate this. From looking at the examples briefly, the format may be a bit richer/have a few more fields than plain DC. But I still think all their fields can be mapped to something in the DDI that we already know how to import. So rather than write new import code, it may be easier to provide an XSLT transform from this format to DDI, and then import that. The OAI code library we are using has mechanisms for doing that in real time.

(We should also add a simple filter to strip the unsupported formats from the list in the harvesting configuration menu; allowing users to select something that's not going to work is bad).

mreekie commented 2 years ago

Notes - discussion today:

pdurbin commented 2 weeks ago

I was just in a CAFE/RAPID/DesignSafe meeting and we talked about how we know that the "oai_dc" format works for harvesting from DataCite into Dataverse. There's a related issue about that here:

However, "oai_dc" is somewhat limited in the number fields it supports (only 15 or so).

This issue #7727 seems to be about adding the ability to import or harvest the "oai_datacite" format as well. It already appears in the dropdown as well as a format called "Datacite":

Screenshot 2024-10-29 at 4 17 06 PM

The harvesting formats that I've heard work fine are "oai_dc", "oai_ddi" and "dataverse_json".

It sounds like "oai_datacite" doesn't work. I don't know if "Datacite" format works or not.

cmbz commented 1 week ago

2024/11/04: Adding to GREI harvesting improvements list: https://github.com/IQSS/dataverse-pm/issues/171