cznethub / dsp

CZNet Hub Data Submission Portal
BSD 3-Clause "New" or "Revised" License
4 stars 1 forks source link

Get funding agency and award information from Zenodo Funder element #126

Closed horsburgh closed 6 days ago

horsburgh commented 5 months ago

Describe the feature you'd like and what it will do A clear and concise description of what you want to happen.

When we built the DSP, Zenodo did not have any of the CZNet funding awards in their database of funded projects from which you could choose to specify funding agency information. The only option was to choose from the list, so we couldn't use Zenodo's "Funding" element to specify award information. Because of that, we were encouraging people to use the "Notes" field in Zenodo's metadata schema to submit funding agency information. Zenodo has now updated their database of awards from NSF, and users can now choose the CZNet awards from the list. So, we need to do the following:

Why is this feature important? A short description of the importance of this feature and what it will help you achieve.

We are tracking submitted resources for reporting purposes and for associating resources in the discovery system with their CZNet Cluster project using the NSF award number provided by the user when they share their dataset. Users can now put that information in the Zenodo "Funding" element, so we need to look there to retrieve it when we are putting the metadata for the registered dataset in the Catalog database. We should probably do this in addition to and not instead of looking in the "Notes" field because there may already be resources where the information is in the "Notes" field.

Is your feature request related to a problem? Please describe. A clear and concise description of what the problem is. Ex. I'm always frustrated when [...].

This is a new feature request based on updates to Zenodo.

Additional context Add any other context or screenshots about the feature request here.

This resource in Zenodo is a good example to test with. This resource was created by one of the CZNet Cluster projects and has funding award information in the "Funding" element.

https://zenodo.org/records/10081892

Maurier commented 1 month ago

@pkdash - As part of this work, we need to make sure that funding metadata is correctly extracted after incorporating Zenodo's "Funding" element (https://github.com/cznethub/dspback/blob/develop/dspback/schemas/zenodo/schema.json#L207). We also need to keep the logic that extracts funding metadata from the "Notes" field in order to support resources registered before these schema changes, and improve this logic with a robust regular expression that can handle special characters in its content (like colons).

horsburgh commented 1 month ago

@pkdash - just a little more background on this. I believe @sblack-usu developed code to associate datasets registered with the DSP with the different thematic cluster projects based on their NSF project numbers. Normally he would be looking in the "Funding" element, but initially we were not using Zenodo's Funding element because they force you to select an existing project and none of the CZNet projects were in their list. The CZNet projects have now been added to zenodo's list of projects, but any resources submitted before that would have put their funding information in Zenodo's Notes element. I don't know if Scott's code searches the Notes element for NSF award numbers, but we need to as it's obvious that there are some Zenodo resources with funding information in the Notes element.

Example: https://zenodo.org/records/8302139

If there is existing code that scans the Notes element for award numbers it isn't working correctly (or we are missing the award number in the list of awards we are checking for) because this Zenodo resource has funding information with an award number in the Notes element and it is not associated with the Dust^2 cluster in the discovery app.

pkdash commented 1 month ago

@horsburgh and @Maurier I can start working on this issue later this week.

Maurier commented 6 days ago

The logic to catalog these datasets has been updated by @pkdash. Closing.