NCEAS / metadig-checks

MetaDIG suites and checks for data and metadata improvement and guidance.
Apache License 2.0
8 stars 9 forks source link

resource.URLs.resolvable fails to work for most DOIs #453

Closed jeanetteclark closed 6 months ago

jeanetteclark commented 6 months ago

Description

Taken from #437, here is a summary of what happens with this check:

- the assement check sends a HTTP head request
- the DOI resolve service correctly sends the HTTP 301 status code
- the redirect URL https://agupubs.onlinelibrary.wiley.com/doi/10.1002/2017WR020471 is returned
- the library that the check is using sends a Head request to the redirect URL
- the wiley.com server sees that the "User Agent" is not a web browser and returns an HTTP 503 status
- the check fails as the 503 status means "Service Unavailable"

The code that runs the above is actually in metadig-py, here

Solution

@JEDamerow brought this to my attention a while back, and we decided a middle ground was to make the check optional. It sounds like now the preference is to remove it all together.

The only other option that I can see is to alter the check procedure for DOI type URLs to not follow the redirect and assume that the follow will work correctly. Thoughts on that @mbjones?

mbjones commented 6 months ago

I think not following the redirect would defeat the purpose of the check. However, have you tried masquerading as a browser user agent and see if Wiley lets the redirect go through instead of returning a 503? You could check this with curl pretty quickly, especially if you use the Firefox "Copy as curl" command that captures all of the cookies and other paraphernalia sent by the browser to the server.

jeanetteclark commented 6 months ago

@mbjones I have already tested that and it works, but I thought we had decided a while back that it was not good practice to have metadig-engine impersonate a browser. Let me know if I am misremembering that

jeanetteclark commented 6 months ago

after some discussion, I'll implement and test the following:

try with a regular UA field identifying us, if it passes then PASS, if it fails with a 4xx code then FAIL; if it fails with a 5xx code, then retry with a browser user agent, and if it succeeds put it as PASS, if it fails then put it as a FAIL

I'll also add some text to the output message indicating what happened. Finally, we can change this check to informational to keep it from affecting the score

jeanetteclark commented 6 months ago

unfortunately, after testing I wasn't able to find any way to impersonate a web browser well enough to trick the wiley page into letting me through. After consulting with @JEDamerow I removed the check from the ESS-DIVE suite