NCEAS / metadig-engine

MetaDig Engine: multi-dialect metadata assessment engine
7 stars 5 forks source link

Odd behavior of check.nsf.award.numbers.in.nsf.database.1 #112

Open amoeba opened 6 years ago

amoeba commented 6 years ago

I'm not sure if this is a bug or really what's going on. Jesse found this.

For some reason, this page:

https://arcticdata.io/catalog/#quality/doi%3A10.18739%2FA2556Q

is reporting

The award number '0202076' was not found in the NSF award database.

even though the EML has another award in it:

    <project>
      <title>Soil bacterial community and functional shifts in response to altered snow pack in moist acidic tundra of Northern Alaska</title>
      <personnel>
        <individualName>
          <givenName>Michael</givenName>
          <surName>Ricketts</surName>
        </individualName>
        <role>originator</role>
      </personnel>
      <funding>
        <para>0612534</para>
      </funding>
    </project>

It looks like this is the check that was run: https://github.com/NCEAS/metadig-engine/blob/master/src/main/resources/checks/nsf.award.number.in.db.xml

gothub commented 6 years ago

@amoeba The selector from the check looks correct, maybe it's some kind of caching issue. I'll have a look...

isteves commented 6 years ago

A PI just brought some weird quality report messages to our attention.
https://arcticdata.io/catalog/#view/urn:uuid:e3708b3c-2a47-42fd-894f-c1b63488bbef

In particular, he was worried about the funding number, which could not be found in the NSF database by the quality report.

screen shot 2018-05-01 at 9 41 30 am

He was also concerned with the message about including an email address, which is included in the EML but not detected by the quality report

screen shot 2018-05-01 at 9 41 48 am

Finally, his geographic coverage correctly displays as a spot in Alaska, but it's unclear what the check is indicating. It's grouped in with other errors/warnings, so it makes it seem like a problem:

screen shot 2018-05-01 at 9 41 59 am
mbjones commented 6 years ago

The award number is probably because it has 'plr-' as a prefix. If you remove that, does it find it?

The email address issue I think is a poorly worded message, and should be fixed. I think creators and contacts have email addresses, but they are missing mailing addresses. This is just an informational check and is not considered an error.

The geographic check indicates one of his bounding boxes is in Alaska, which is good. Again, this is an informational check but can be helpful to locate packages that are not in the Arctic.

gothub commented 6 years ago

I don't get a hit for https://api.nsf.gov/services/v1/awards.json?id=plr-1304684, and https://api.nsf.gov/services/v1/awards.json?id=1304684 gives a project that looks very different:

"title" : "Collaborative Research:  What Role Do Glaciers Play in Terrestrial Sub-Arctic Hydrology?"
isteves commented 6 years ago

The PI is correct on that. That should be the one!

isteves commented 6 years ago

@mbjones removing the 'plr-' prefix made it work fine.