NCEAS / metacat

Data repository software that helps researchers preserve, share, and discover data
https://knb.ecoinformatics.org/software/metacat
GNU General Public License v2.0
26 stars 12 forks source link

Add indexing support for new EML 2.2.0 `project/award` structure #1348

Open amoeba opened 5 years ago

amoeba commented 5 years ago

(Opening a separate issue from https://github.com/NCEAS/metacat/issues/1256#issuecomment-483803437 to continue this discussion)

EML 2.2.0 adds support for structured funding information beyond the old project/funding field which is defined as a TextType. The new structure looks like:

<award>
  <funderName>National Science Foundation</funderName>
  <funderIdentifier>https://doi.org/10.13039/00000001</funderIdentifier>
  <awardNumber>1546024</awardNumber>
  <title>Scientia Arctica: A Knowledge Archive for Discovery and Reproducible Science in the Arctic</title>
  <awardUrl>https://www.nsf.gov/awardsearch/showAward?AWD_ID=1546024</awardUrl>
</award>

Right now, no indexing is done for project info but this has been requested for some time. See https://github.com/NCEAS/metacat/issues/1256. There are some considerations here we should discuss:

  1. How to index award numbers
  2. Which other fields to index

1. How to index award numbers

I can think of two ways to go with this:

1a. Index project/award/awardNumber as a new field 'awardNumber' that sits alongside funding which is populated by project/funding. 1b. Index project/award/awardNumber in the same 'funding' field as project/funding.

1a is nice because it allows queries to be much more specific but requires clients to do queries like ?q=funding:*12345*+OR+awardNumber:12345 when looking for that award. 1b is nice because it's backwards compatible with project/funding. i.e., you can search a single 'funding' field to find an award.

I like 1b as I upweight backwards compatibility. @mbjones has a vote in for 1a which makes a ton of sense and is workable. I'd love to hear others' thoughts on this, esp. @laurenwalker and/or @csjx.

2. Which other fields to index

@mbjones suggested we also index:

and leave title and awardURL out. I think that makes sense but am curious if there are other thoughts.

amoeba commented 5 years ago

We talked on Slack and the mini-consensus was to keep a funding field around for all metadata standards/versions to store their funding info. EML 2.1.1 would store /funding there and EML 2.2.0 would store /award/awardNumber and any other standards (e.g., https://schema.org/award or datacite:awardNumber) would store funding there too and it'd be stored as both a string and text to allow searching and faceting.

The changes aren't merged into a dev or the master branch so I'l leave this open for now.

mbjones commented 3 years ago

Even if we keep a free-text funding field in the index (which we should for compatibility), I think we should also be adding more controlled fields for the new structured fields in EML 2.2, and ensure we can handle repeating award elements:

<award>
  <funderName>National Science Foundation</funderName>
  <funderIdentifier>https://doi.org/10.13039/00000001</funderIdentifier>
  <awardNumber>1546024</awardNumber>
  <title>Scientia Arctica: A Knowledge Archive for Discovery and Reproducible Science in the Arctic</title>
  <awardUrl>https://www.nsf.gov/awardsearch/showAward?AWD_ID=1546024</awardUrl>
</award>
jeanetteclark commented 2 years ago

giving this a bump as we need it for our quarterly reports - previously to get this information I had Chris query the metacat database directly

mbjones commented 2 years ago

@taojing2002 jing, can you add this to the next release please?

taojing2002 commented 2 years ago

Sure, Matt. On 11/27/21 2:20 PM, Matt Jones wrote:

@taojing2002 https://github.com/taojing2002 jing, can you add this to the next release please?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NCEAS/metacat/issues/1348#issuecomment-980800505, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB5QQDCV6K5A3TMRPKMWMSTUOFKTXANCNFSM4HGPM3VQ. Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.