IQSS / dataverse

Open source research data repository software
http://dataverse.org
Other
861 stars 481 forks source link

Spike: Investigate how Dataverse stakeholders and users need to collect and use funder metadata #4859

Closed jggautier closed 1 week ago

jggautier commented 6 years ago

In Dataverse 4.x Citation metadatablock, in the Contributor metadata field, there's a "Funder" contributor type:

screen shot 2018-07-17 at 11 33 30 am

The "Funder" type comes from DataCite's list of contributor types, added in their 3.x schema. I think we should remove the contributor type "Funder" because:

  1. It's a duplicate of Dataverse's "Funding Information Agency" field....

    Screen Shot 2023-01-27 at 12 48 42 PM

    ...and it's probably confusing depositors.

    • The "Funding Information Agency" fields are used more often across the known Dataverse installations, but there are cases where depositors entered the same funder names in both fields, cases where depositors entered a name in the Contributor field and not the "Funding Information Agency" field, and cases where depositors entered a name in the Contributor field, nothing in the Funding Information Agency field, and something in the Funding Information Identifier field:

    Screen Shot 2023-01-27 at 12 56 47 PM

  2. This complicates metadata exporting and makes it harder to find data based on who funded the research. For example, if we send funding metadata to DataCite, it won't except metadata that includes "Funder" as a Contributor Type. Newer versions of the DataCite standard don't include a "Funder" contributor type. (It was deprecated when a FundingReference property was added, so that more information about funding could be included in subproperties of FundingReference.)

Definition of done:

jggautier commented 2 years ago

While updating the crosswalk I saw that when you use an Atom entry (XML) to create a dataset, funder metadata is mapped to the Contributor field, with Contributor Type being set to "Funder". So if we removed Contributor Type "Funder," the mapping done when an Atom entry is used to create a dataset would need to change so that funding info is mapped to the "Funding Information" field.

mreekie commented 1 year ago

@pdurbin @jggautier There seems to be a question of where this should go. This issue is in the deliverable backlog but under 1.5.1. I'm not an expert here. Is it better addressed here or under NIH OTA 1.2.1 ?

jggautier commented 1 year ago

I think this should be worked on as part of any effort to improve how Dataverse collects and exports funding metadata about datasets, which I think is a goal of NIH OTA 1.2.1 so I think it should be addressed in 1.2.1.

pdurbin commented 1 year ago

No strong opinion. It could be worked on under either.

pdurbin commented 1 year ago

mapped to the "Grant Information" field (which is being renamed "Funding Information")

Just a note that yes, this has been renamed. From Harvard Dataverse running 5.12.1:

Screen Shot 2023-01-18 at 3 34 55 PM

jggautier commented 1 year ago

Thanks @pdurbin. I updated the original comment.

jggautier commented 1 year ago

I'm talking with depositors in the Harvard repository who've used both fields in the same dataset most often, in order to learn why. I spoke with a manager from the WorldFish repository. They've used both fields only because when they create datasets and include funder names in the metadata, they often use a different platform instead of entering metadata in the Dataverse deposit form. And in the platform's deposit page, the metadata field for funder names, called "Donor", is mapped to Dataverse's "Contributor" field and given the "Funder" contributor type. We spoke about how they should update their platform so that they use the Funder Information field instead, and we'll schedule another meeting, hopefully one including developers of their platform, to review the changes being made to the Funder Information field (https://github.com/IQSS/dataverse/issues/9150).

They said it's fine if we move the funder names in their datasets' Contributor field to the Funder Information field.

This week I'll be meeting with the manager of another collection that most often adds funding metadata to their datasets to learn why they've used both fields.

jggautier commented 1 year ago

I reviewed the metadata I collected from most known Dataverse installations in October 2022 (https://doi.org/10.7910/DVN/DCDKZQ, version 12) to learn which datasets have values in both fields. While Harvard Dataverse had the most of these kinds of datasets at the time (250), other Dataverse installations also have datasets with funder names in both fields. Here's a CSV file listing the datasets, which installation they're published in, and the funder names entered in both fields: duplicateFundingFieldsInAllInstallations.csv

I've emailed the Dataverse Google Group to try to learn, from as many Dataverse installations as possible, why both fields were used and what we should consider when moving the funder names in the Contributor fields to the Funding Information fields. See this GitHub issue's original post, which I've been updating with a more detailed proposal.

shlake commented 1 year ago

All datasets in UVa Dataverse have "Funder" information in the grantNumber block. Examples: https://doi.org/10.18130/V3/FRZYXV https://doi.org/10.18130/V3/VJUZSH https://doi.org/10.18130/V3/YWTLHC

BUT have made this block "displayoncreate" = TRUE With this block showing on dataset creation, I hoped this would prevent the information going into some other field.

AND in hopes to make what goes in that field clearer (to US), I have changed the title and the description

name            title               description
grantNumber     Grant/Funding Information   Grant or Funding Information
grantNumberAgency   Grant/Funding Agency        Funding Agency
grantNumberValue    Grant Number            The grant or contract number of the project that sponsored the effort.
jggautier commented 1 year ago

I just realized that this was removed from a Dataverse_Funded_Deliverables list last month, but I'm not sure what that means. @mreekie could you write about what that means?

I'm wondering if this will be addressed as part of efforts, such as https://github.com/IQSS/dataverse/issues/9150, to improve the quality of funding metadata in Dataverse repositories. I think it should; trying not to let it fall through the cracks.

cmbz commented 1 year ago

I will follow up with @mreekie to ascertain where this issue should be moved (e.g., back in to Global Backlog related to NIH deliverable)

pdurbin commented 1 year ago

A couple other brand new funding-related issues, spawed from #9150 (just closed) I believe:

mreekie commented 1 year ago

sizing:

cmbz commented 1 year ago
jggautier commented 1 year ago

Here are datasets from different Dataverse installations that show different cases:

There are funder names in the Contributor Name fields and no funder names in the Funding Information Agency fields For example, where Contributor Type is "Funder", Contributor Name is "X", and nothing is entered in the Funding Information Agency field

There are different funder names in the Contributor Name fields and in the Funding Information Agency fields For example, where Contributor Type is "Funder", Contributor Name is "X" and Funding Information Agency is "Y"

The same funder names in are the Contributor Name fields and Funding Information Agency fields (same strings) For example, where Contributor Type is "Funder", Contributor Name is "X" and Funding Information Agency is also "X"

The same funder names are in the Contributor Name fields and Funding Information Agency fields (but different strings) For example, where Contributor Type is "Funder", Contributor Name is "Australian Research Council" and Funding Information Agency is "ARC"

scolapasta commented 1 year ago

We discussed this at tech hours and decided that the approach we would prefer (as it would generally be more flexible and it would maintain the concept of not changing old versions) would be to add an attribute to metadata fields so they can be marked as deprecated.

A deprecated dataset field would still allow old values to be viewed, but if you tried to edit you would be forced to no longer use that field (we would want some sort of instructions in the error message to advise you what you should use instead).

In this way, we won't have to write any scripts to change old values in the database. Rather we would break out this current issue into 3: • one for the infrastructure component adding the new deprecated attribute and the logic for how to handle for create and edit • a new API that would allow you to look up if a dataset's latest published version uses any deprecated fields (and possibly an API to get you a list of all such datasets) • a script that could be used to change some of the old values we know to the new values we would like to use - these would be written in a way that they could be reusable for multiple datasets

Hopefully I was able to summarize everything we discussed about this in the tech hours; if anyone noticed anything I missed, please feel free to add on.

cmbz commented 1 year ago

Moved this issue back into SPRINT NEEDS SIZING given outcome of Tech Hours meeting where it was decided to split this issue into three separate issues.

stevenmce commented 1 year ago

Hi everyone,

I've just noticed this discussion, and wanted to add ADA's perspective (noting we have examples listed above in Julian's posting).

We made a decision several years back to make more active use of the Contributor > Funder field, rather than the Funder Information field, as we could see increasing demand for outlining a range of Contributors and their roles. Our preference is to maintain Funder as a Contributor type, as it allows us to make assessments and queries against all different Contributions, rather than having to query multiple distinct fields (Data Collector is a similar case here).

There is certainly a need to include Funding information (particularly grant IDs), but ensuring we can query all types of Contribution - including Funder - is an important attribute for us that we would like to see maintained.

Cheers, Steve McEachern, ADA

jggautier commented 1 year ago

Thanks @stevenmce for finding this GitHub issue and letting us know about this requirement! We'll keep it front of mind as we work out the details of @scolapasta's proposal.

When I collected information from other Dataverse installations, I was planning on reaching out to each of them to learn more, and your comment is a great example of the need for more proactive UX research. Before any changes are made, I'll try to get time to reach out to other installations, too.

cmbz commented 1 year ago

2023/06/21: Follow up with a design meeting to discuss options

cmbz commented 12 months ago

2023/07/25: Schedule meeting to discuss and identify an approach to address the problem. Will also need to confirm with community members (e.g., Steve) that the solution will meet their needs. (More than just a Tech Hours talk)

jggautier commented 9 months ago

Just an update about the meeting @cmbz mentioned. This Friday @cmbz, @scolapasta and I will be talking about this while planning for how Dataverse should follow a set of metadata recommendations from NIH GREI.

cmbz commented 9 months ago

2023/10/16

jggautier commented 7 months ago

Just an update:

jggautier commented 7 months ago

More updates:

jggautier commented 7 months ago

More updates:

While the CGIAR and Borealis folks are discussing, I emailed Steve McEachern to share what I learned from a review of funder metadata in ADA's repository and wrote that I'd share in this GitHub issue.

To reiterate and expand on the great points Steve made in June:

When looking at ADA's and other installation's funder metadata, I also noticed and at least want to acknowledge these things:

jggautier commented 7 months ago

This GitHub issue is in Dataverse SODHA's "Santa's watching" list, so I emailed those folks to learn about their interests in this issue. Our regular contacts for this installation, Benjamin Peuch and Youssef Ouahalou, are no longer working on this installation, so I emailed the installation's general email address as Youssef suggested.

As with Borealis and CGIAR, I'm waiting to hear back from them and I'll follow up on the first week of January after the winter break.

jggautier commented 6 months ago

Just an update on progress so far. Most of the discussion happening in the GitHub issue at https://github.com/IQSS/dataverse/issues/10196 involves user goals we'll need to consider when we're thinking about a redesign of how Dataverse collects and distributes funding metadata.

I'm also trying to find time to chat with @stevenmce about what he wrote about ADA's needs and goals. And I've been emailing with @amberleahey about Borealis's use of funding metadata fields and with folks from CGIAR.

cmbz commented 1 week ago

2024/07/10

jggautier commented 1 week ago

Yes definitely! I'm going to close this issue, since the discovery research that the community was doing last winter effectively ended in April when the UX WG started planning and executing the design sprint (being tracked in https://github.com/IQSS/dataverse-pm/issues/127 and GitHub issues listed in that issue).

We've been using what we've learned in this spike GitHub issue and in https://github.com/IQSS/dataverse/issues/10196 as we plan for how to evaluate the success of a redesign of the Citation metadata block and use of the external controlled vocabulary functionality and as we consider different design ideas for addressing the goals driving the work recorded in this GitHub issue - improving the experience of adding funding metadata and of making it easier to find datasets by funders, in part by resolving the issues caused by having two places on the dataset deposit form where people can record who funded the deposit and keeping in mind what we learned from @stevenmce about the value of thinking of a funder as a kind of contributor.

More broadly, I hope that the design sprint idea we're working on can help us more effectively research, like what was done for this spike issue, by timeboxing research and setting other expectations for how much resources we'll need from stakeholders who are vital to our shared understanding of goals.