Open gothub opened 2 years ago
@mbjones @vchendrix @JEDamerow
what are your thoughts on this?
If the identifier is resolvable via the DataONE resolve service, but is not publicly readable, the message is printed:
The metadata identifier was found and is resolvable using the DataONE resolve service, but is not publicly readable
.
This change was made in commit f1ac27e8ce280fb221503d224de0ac20279483ee
With the ess-dive-1.1.0 suite release in production:
It appears that this fix is not working in production, see https://data.ess-dive.lbl.gov/quality/ess-dive-7fc993dad587390-20220324T214014280. This pid does exist, but is not readable, as the following URL produces the msg shown below:
https://cn.dataone.org/cn/v2/meta/ess-dive-7fc993dad587390-20220324T214014280
<error detailCode="1040" errorCode="401" name="NotAuthorized">
<description>READ not allowed on ess-dive-7fc993dad587390-20220324T214014280 for subject[s]: public; </description>
</error>
So this check should pass, but it is marked as failed in the assessment report.
Also, as this is the only 'Assessment' type report (from FAIR categories), the display of the assessment report does not show a 'progress bar' at the top of the report for the 'Assessment' category. (File a separate metacatui issue for this).
The URL https://doi.org/10.1002/2017WR020471 included in the metadata of data package https://data.ess-dive.lbl.gov/view/ess-dive-c4d31960b81d845-20220406T213905525 is causing a failed metadata.URLs.resolvable check. The DOI is resolvable and correctly points to a dataset landing page, so this check should pass.
Private dataset submitted for publication https://data.ess-dive.lbl.gov/quality/ess-dive-838a0b1a47f1695-20220414T225815875 is incorrectly failing the metadata.identifier.resolvable check. This dataset utilizes the "sameAs" external linking relationship. The Accessible check category is not showing on the assessment report.
@emilyarobles @val the URL mentioned above resolves to https://agupubs.onlinelibrary.wiley.com/doi/10.1002/2017WR020471. It appears that this website disallows web user agents such as Python and R to send HTTP "Head" requests to see if a web page exists. This is what I think is happening when the assessment check tries to check the DOI:
I'm open to suggestions on how to proceed with resolving this issue.
@emilyarobles @Val At the 2022-04-19 tech meeting, it was recommended to attempt to modify the check so that if the metadata id is a DOI URL and a 503 status is returned, then attempt to just check if the URL is a valid registered DOI. I will check if doi.org supports this type of query.
The text returned by the check would then be:
Note that the check that is actually being run in the ESS-DIVE suite is named 'resource.URLs.resolvable'. The initial issue for this check is here.
@gothub The following links are incorrectly being flagged as unresolvable:
Dataset 1: https://doi.org/10.1021/acsearthspacechem.2c00031
Dataset 2: https://earthdata.nasa.gov; https://daymet.ornl.gov
Dataset 3: https://doi.org/10.1890/12-1243.1; https://doi.org/10.1890/13-1313.1; https://doi.org/10.1038/ismej.2016.122
@emilyarobles @charuleka @JEDamerow We have discussed this issue in the ESS-DIVE/NCEAS tech meeting and it is recommended that we move the get URL checks to be warnings when they fail as we cannot control how publishers response to programmatic access of their publications pages. Ping me if you would like to discuss in person.
Ok, sounds reasonable to me.
circling back to this issue based on some feedback from @JEDamerow. Unfortunately there isn't an easy fix for the DOI urls. Based on the comment above, I'll change the check to optional, as well as add a few more response codes to the passing list.
This check verifies if the metadata identifier is a resolvable URL. In the case that the identifier is a bare string (not starting with http: or doi:), then the DataONE resolve service is called for the identifier. The check will pass if the identifier resolves with the DataONE service.
If the DataONE id is private, the check will fail. The metadig-engine has privilege to read the metadata and sysmeta for the pid, bug the checks themselves are run by the engine without privilege. Therefore an HTTP 401 status is returned.
For DataONE ids only, should the check detect 401 status and return a message such as:
... or something else?