Open philippconzett opened 11 months ago
Thanks for the issue Philip, we created a Project field for some of these needs, we were currently planning to merge it back into funding so a discussion on the topic would be interesting. Here is the current state of our project block (in citation) : We've very recently also been asked to add a field for the deliverable in addition to WP and Task we have.
Thanks, Dimitri. It would be good to add these fields to the main distribution of Dataverse.
Dedicated and separated metadata field to indicate "Horizon Europe" (see survey question 4.2)
Here's 4.2:
Does Horizon Europe have a PID? The PID for the NIH in the US, for example, is http://dx.doi.org/10.13039/100000002
Interesting question, Horizon Europe is the funding programme and not the agency, but maybe they also have identifiers and would need an additional field for programme identifier.
Other question, should we take the work on these fields as an opportunity to change the current bloc name "grantNumber" ?
Hey all. I've been researching how people describe who funds the research data they deposit in order to help improve how Dataverse collects and distributes that metadata, and updating the GitHub issue at https://github.com/IQSS/dataverse/issues/4859, whose scope has broadened, beyond what the issue title suggests, to account for what we've learned so far about how folks are using certain metadata fields.
So I'm very interested in this issue, too, and have questions.
@DS-INRA, why'd you mention that Horizon Europe is the funding programme and not the agency? Is it because NIH, which @pdurbin mentioned, is an agency? Or because there's a field that ships with Dataverse called Funding Information Agency?
I'm wondering if the distinction between an agency and funding programme is important and why. Although the label and tooltip text for the Funding Information Agency field that ships with Dataverse, in the Citation metadata block, has the word "Agency", we don't mean to limit the types of funders to "agencies", and I don't think the DataCite and DDI metadata standards that informed the design of the fields mean to limit funder types to "agencies" either.
@philippconzett, when you wrote that Dataverse should add these two metadata fields that you mentioned:
... are you saying that the fields that ship with Dataverse don't include dedicated and separated metadata fields for this metadata?
Why couldn't people use the current Funding Information fields for this? Such as:
Lastly, on Demo Dataverse, when people use the Funding Information Agency field, they're able to choose organization names suggested from the Crossref Funder Registry, and several things show up when I enter "Horizon Europe". I haven't looked too closely at what appears, but I wonder if Horizon Europe does have an entry in the Crossref Funder Registry.
Yes, what @jggautier said. I'm also wondering if people can just type "Horizon Europe" under the existing field.
On the other hand, the study's final report, ERC Study on repositories - final report.pdf at https://zenodo.org/records/7728016 - makes me wonder if another field is needed for repositories that need to be able to comply with these requirements.
Before I read the study, I thought "Horizon Europe" was the name of the funder. And I thought that the terms "Funding Stream"; grant or funding number; and Grant or funding PID were all describing the same concept.
But on page 32 of the study, they write more about "Funding Streams":
"We considered the Funding Stream per definition of OpenAIRE, where information regarding the Funding programme (FP7, H2020, Horizon Europe) is provided. Also, we included the information about the Funder, as repositories typically need to report funders other than the European Commission."
And on the table on that page, they describe "Horizon Europe" as a Funding Stream:
So it seems like it's a separate concept. That is:
And a dedicated field for Funding Stream makes more sense. Neither of Dataverse's fields for funding metadata, "Funding Information" and "Contributor Name," include a dedicated field for "Funding Streams". So people have entered "European Union" and "Horizon Europe" in Dataverse's Funding Information Agency field (such as https://doi.org/10.34810/data686), and others have entered "European Union’s Horizon 2020 research and innovation programme" or "European Union‘s Horizon 2020" in either of Dataverse's funder fields: either Funding Information Agency or the Contributor Name field (where they chose Funder as the Contributor Type).
I'm confused about the Project Name concept, which the study's authors mention earlier in their report (page 21) as a requirement of Horizon Europe MGA. At https://openscience.cuni.cz/OSCIEN-90.html about the metadata that Horizon Europe beneficiaries should include, there's no mention of Project Names, but they do mention "Grant project name, acronym and number".
It's also interesting that the report's authors write that "OpenAIRE compliance for the repositories included in the study was derived from the OpenAIRE website" and that their definition of Funding Stream comes from OpenAIRE. But OpenAIRE's metadata guidelines don't include a way to record "Funding Streams" as distinct from Funder Names. Even later versions of the DataCite standard, which removes the "Funder" type from their list of Contributor types and adds a Funding References field and child fields, doesn't include a field (or property) for a Funding Stream.
When the community was designing Dataverse's OpenAIRE metadata export, we wrote that we need to be "able to share metadata about data in the way OpenAIRE is requiring, by using OAI-PMH to harvest OpenAIRE-compliant metadata", and that we'd make design decisions based on a more recent version of DataCite, version 4.1, believing that later versions of OpenAIRE's recommendations would include changes that DataCite made to its standard. More discussion about this is at https://groups.google.com/g/dataverse-community/c/OALTzINxkX0/m/v_WwJ4cvAwAJ, https://github.com/IQSS/dataverse/issues/4257#issuecomment-368829767, and https://github.com/IQSS/dataverse/issues/5889.
That change in the DataCite standard included how DataCite would like funding metadata included in the DataCite standard. So Dataverse has been adding funding metadata in Dataverse's OpenAIRE export, and available for OAI-PMH harvesting, to the funderName child field (or subproperty) of the fundingReference field (or property). For example, here's what some funding metadata looks like when included in Dataverse's OpenAIRE export:
This doesn't seem to comply with OpenAIREs guidelines, which expects the funder metadata in the Contributor field, although that seems even more inadequate for the type of metadata requirements being written about in the "ERC Study on repositories" report.
So I think my questions are:
In the survey results, how accurate is the data about the repositories that collect "Funding Stream" metadata? Among the 220 repositories they reviewed, we can see survey results for at least 9 repositories that use the Dataverse software, but the study authors wrote that the survey results might be inaccurate either because the repositories who self-reported might have interpreted the survey questions differently or the study's authors might have interpreted things differently when they needed to get the information themselves. The survey results are in "ANNEX 3 - Study curated data.xlsx" in the dataset at https://doi.org/10.5281/zenodo.7728016, and those 9 Dataverse repositories I could see are (1) Australian Data Archive, (2) Data Station Archaeology, (3) DataRepositóriUM, (4) DataverseNO, (5) Harvard Dataverse, (6) KU Leuven RDR, (7) Qualitative Data Repository, (8) Tilburg University Dataverse, and (9) UNC Dataverse. Should we contact the study's authors to ask? @philippconzett, would you be able to, or would you mind if I did?
Is how I've described the differences among the terms "Funders", "Funding Streams", "Grant Numbers", and Grant PIDs right?
How do the concepts "Project Name" and "Project Acronym" fit into how I described the differences among the terms "Funders", "Funding Streams", "Grant Numbers", and Grant PIDs? And how do the "Project Information" fields that @DS-INRA mentioned relate to all of this?
Are we recommending that funding metadata be organized as part of a Project Information field that would be added to the metadata fields that ship with the Dataverse software? And if so, do all datasets have Project Names? Would this work for most repositories using Dataverse?
How are 3 of the 9 repositories in compliance with OpenAIRE, according to the survey results, despite how those repositories export funder metadata in their OpenAIRE exports in a way that doesn't follow OpenAIRE's current metadata requirements? The three repositories are DataRepositóriUM, DataverseNO, and KU Leuven RDR. I'm thinking of asking the folks who work on OpenAIRE's metadata requirements.
So it seems like it's a separate concept. That is:
- A Funder, like the European Union, can have one or more Funding Streams
- A Funding Stream, like Horizon Europe, can have one or more Grants
- And of course each Grant can have a Grant Number and a persistent identifier
That is correct, for European project typically the EU is the funder, with several distinct streams (e.g. Horizon 2020, Horizon Europe, ...) which have a determined number of grants for which projects are made.
How do the concepts "Project Name" and "Project Acronym" fit into how I described the differences among the terms "Funders", "Funding Streams", "Grant Numbers", and Grant PIDs? And how does the "Project Information" fields that @DS-INRA mentioned relate to all of this?
They are additional informations , only the identifer (here Grant Number) is the same for the grant/funding/project Here is a concrete example for a european project(see https://doi.org/10.3030/857650) :
Are we recommending that funding metadata be organized as part of a Project Information field that would be added to the metadata fields that ship with the Dataverse software?
It would be best to have the funding and project merged as as seen previously they overlap, we should discuss on the labels as not all funding might be called "projects" though.
And if so, do all datasets have Project Names?
In our case, no, even for some dataset that may have funding from other sources (e.g. from administrative regions)
How are three of the nine repositories in compliance with OpenAIRE, according to the survey results, despite how funder metadata is currently included in Dataverse's OpenAIRE export in a way that doesn't follow OpenAIRE's requirements? The three repositories are DataRepositóriUM, DataverseNO, and KU Leuven RDR. I'm thinking of asking the folks who work on OpenAIRE's metadata requirements.
You can contact Pedro Principe in the community (I don't know his GH account) who is both involved at DataRepositóriUM and OpenAIRE provide :)
@jggautier Thanks for your thorough and informative follow-up on this issue!
Should we contact the study's authors to ask? @philippconzett, would you be able to, or would you mind if I did?
That's exactly what came to my mind when I was reading your comment. Please go ahead and contact them. I'm sure they'll be more than happy to discuss these issues with you.
Is how I've described the differences among the terms "Funders", "Funding Streams", "Grant Numbers", and Grant PIDs right?
To my understanding, yes.
As for your other questions, I think they could be discussed in a common meeting with the others of the report mentioned above, members of the OpenAIRE team and members of the Dataverse community.
Thanks @DS-INRA and @philippconzett
@philippconzett, about your second point about "a dedicated and separated metadata field for PID(s) for the author(s)’ organisation/affiliation (eg. ROR ID)", the Author Affiliation field on Demo Dataverse has been changed so that we could evaluate how well the design of this implementation of the "external controlled vocabulary" functionality helps people add their affiliation and helps repositories collect and distribute persistent IDs of author affiliations. That work is being described at https://github.com/IQSS/dataverse/issues/9151, although it's also related to how we collect funding metadata, since we also want to collect and distribute persistent IDs of funders, and the thinking so far has been to use the same "external controlled vocabulary" functionality for this, using the persistent IDs and other metadata from the Crossref Funder Registry, although that's being deprecated and the folks at ROR are working to make sure they can be a good replacement.
The requirement for a "dedicated and separated metadata field" is interesting, too, and maybe worth clarifying. The way the Author Affiliation field is designed on the deposit form on Demo Dataverse, there is no separate field for the persistent ID of the author's affiliation. But if the depositor chooses an organization that the field suggests, Dataverse records that persistent ID. So by "dedicated and separated metadata field", I'm assuming they mean to discourage repositories from letting depositors put this information in something like a "catch-all" field, like Description or Notes, which right now would make the metadata less machine-readable and less interoperable.
I'll email the study's authors and @pedroprincipe to ask about the differences between what the study is evaluating and OpenAIRE's metadata guidelines.
I'll need to think more about how to help organize a common meeting that's effective and timely 🤔
Here are my notes from our meeting earlier today (thank you to Anna Pelagotti, Dagmar Meyer, and Emma Lazzeri for feedback):
Requirements from European Commission for Horizon Europe projects (including European Research Council)
How to comply
How to reduce errors
How to proceed with the ERC survey
[1] https://cordis.europa.eu/projects/en [2] PIDs for Horizon Europe etc.: https://data.crossref.org/fundingdata/funder/10.13039/100018693; https://data.crossref.org/fundingdata/funder/10.13039/100019188; https://data.crossref.org/fundingdata/funder/10.13039/100019180; https://data.crossref.org/fundingdata/funder/10.13039/100010663; https://data.crossref.org/fundingdata/funder/10.13039/100011199; https://data.crossref.org/fundingdata/funder/10.13039/501100000781; https://ror.org/0472cxd90
Hi, just to give an update, we are still planning to contribute on this, we should start on the first steps (imo adding the fields) in our april sprint
Hi all. After the January 18 meeting that @philippconzett mentioned, I emailed the folks who joined the meeting and some others with follow up questions, next steps and other things we need to consider, and thought I'd include those in this GitHub issue.
My take was that we all agreed that a major goal here is to make it easier for the folks from the European Commission to be able to track outputs of the research they fund, particularly by using OpenAIRE's infrastructure.
So it's important that we're able to connect with folks from the European Commission who track research outputs and with grantees who need to make sure that their funders are aware of the data and code they publish, so that our understanding of their experiences can inform the changes we make to Dataverse and so that we're able to evaluate how their experiences are changed (and hopefully improved!) by those changes to Dataverse.
As @philippconzett mentioned, we need to learn from folks at OpenAIRE about their metadata guidelines, such as how they hope the existing guidelines or their changes to the guidelines will ensure that their systems can make it easy for funders and grantees to report and track research outputs.
It would be helpful to understand how Zenodo sends to OpenAIRE the funder metadata they collect, given Zenodo's close association with the European Commission and OpenAIRE.
And we need to make sure we're aware of similar efforts being discussed to help stakeholders with similar goals, like folks from NIH funding groups and folks who manage other Dataverse repositories. This includes discussions in the GitHub issue at https://github.com/IQSS/dataverse/issues/4859 and several GitHub issues about using ROR, like https://github.com/IQSS/dataverse/issues/6640.
So @DS-INRA before any metadata fields are added, I'm recommending these next steps:
I also plan to update the Github issue at https://github.com/IQSS/dataverse/issues/4859 with next steps, and I'm connecting with folks from the NIH so that we can get a better understanding of how they track research outputs and so that we can rely on those connections later to evaluate how their experiences are changed by the changes we make to Dataverse.
It would be really helpful if we could continue discussing these steps and how we might collaborate on them. Being able to scale our ux research is the goal of that UX working group I've been exploring, and this effort seems like a good way to see how we might leverage more of the Dataverse community's resources.
To fulfil the HE MGA requirements we also integrated now new metadata fields in our DV e.g. https://data.aussda.at/dataset.xhtml?persistentId=doi:10.11587/D3PZEA
Nonetheless, an official solution would be a good idea. In addition, the Organization PID issue is still open Feature Request/Idea: Allow ORCID and ROR to be used together in author field Support Research Organization Registry (ROR) IDs #6640 I added also a new issue for the 2024 requirement “separate embargo field” Feature Request: Metadata field for embargoed datasets #10833
Requirements HE MGA survey 2024: In order to reach the “Exemplary Readiness Level”
Interesting. #4859 was opened because there are two ways to enter funding information. Now there are three, in the solution above. 😄
Yes, I agree that some sort of official solution would be nice. 🤔 Meanwhile, it looks like it's working for you! It looks like that field is under the citation block, though. As it's a custom solution it might be better under a custom metadata block.
These are just some idle thoughts as I catch up on GitHub comments. Thanks for pushing the envelope!
I did it not as custom block, because the Funding Information and Grant Project Information are together the HE MGA requirements. I know, it's not a beautiful solution. Even more, I also but a ROR PID field into the author block. This is all temporary until there is an official solution. At the moment we have only 11 datasets with this requirements. Our funders demanded a solution for HE MGA requirements.
Overview of the Feature Request The European Research Council (ECR) is currently running a survey to collect data for updating the Study on the Readiness of Research Data and Literature Repositories to Facilitate compliance with the Open Science Horizon Europe Model Grant Agreement (OS HE MGA) Requirements. A PDF version of the survey, which was commissioned to a group of independent experts by the European Research Council Executive Agency (ERCEA) is attached to this issue.
This is an umbrella issue to cover the following features needed to achieve full Dataverse support for compliance with the Open Science Horizon Europe Model Grant Agreement (OS HE MGA) Requirements:
What kind of user is the feature intended for? API User, Depositor, Guest
What inspired the request? The ECR survey mentioned above.
What existing behavior do you want changed? Add the metadata fields mentioned above.
Any brand new behavior do you want to add to Dataverse? No, not brand new, but extending metadata capturing and exposing for harvesting.
Any open or closed issues related to this feature request?
4859