NCEAS / metadig-engine

MetaDig Engine: multi-dialect metadata assessment engine
7 stars 5 forks source link

resolving URLs in checks seems to fail even for good URLs #391

Closed jeanetteclark closed 9 months ago

jeanetteclark commented 9 months ago

First reported by Joan at ESS-DIVE, here is an example:

https://data.ess-dive.lbl.gov/quality/ess-dive-28c4750aab75568-20231108T174345678

another example from the ADC, the entity URLs: https://arcticdata.io/catalog/quality/s=FAIR-suite-0.3.1/doi%3A10.18739%2FA23F4KP8Z

Need to determine why these checks seem to be failing and fix issue. These checks work fine for me locally. Running the python code that checks if the URLs are resolvable also seems to work fine from one of the pods on both dev and prod.

mbjones commented 9 months ago

Just a random idea, but maybe the HTTP 307 redirect response header is not being followed correctly?

❯ curl -I https://cn.dataone.org/cn/v2/resolve/urn:uuid:98013671-a16c-4037-85e1-704fbda7f15b
HTTP/1.1 307 307
Date: Fri, 17 Nov 2023 23:45:31 GMT
Server: Apache/2.4.52 (Ubuntu)
Vary: User-Agent,Origin
Location: https://arcticdata.io/metacat/d1/mn/v2/object/urn:uuid:98013671-a16c-4037-85e1-704fbda7f15b
Access-Control-Allow-Origin:
Access-Control-Allow-Credentials: true
Access-Control-Allow-Headers: Authorization, Content-Type, Location, Content-Length, x-annotator-auth-token, Cache-Control
Access-Control-Expose-Headers: Content-Length, Content-Type, Location
Access-Control-Allow-Methods: POST, GET, OPTIONS, PUT, DELETE
Set-Cookie: JSESSIONID=B47BE210B329661E90E2BCF30F07A28E; Path=/cn; Secure; HttpOnly;SameSite=None;Secure
Content-Type: text/xml;charset=UTF-8

❯ curl -I https://arcticdata.io/metacat/d1/mn/v2/object/urn:uuid:98013671-a16c-4037-85e1-704fbda7f15b
HTTP/1.1 200 200
Date: Fri, 17 Nov 2023 23:46:06 GMT
Server: Apache
X-Frame-Options: SAMEORIGIN
Set-Cookie: JSESSIONID=151ED0146C1CBDF90D99EF2CA7933F00; Path=/metacat; Secure
DataONE-Checksum: MD5,a8ac683d81cff24bb76d2f56e1b0a3f9
Last-Modified: Thu, 01 Jan 1970 00:00:00 GMT
DataONE-ObjectFormat: text/csv
DataONE-SerialVersion: 0
Content-Length: 2109003
Access-Control-Allow-Origin:
Access-Control-Allow-Headers: Authorization, Content-Type, Origin, Cache-Control
Access-Control-Allow-Methods: GET, POST, PUT, OPTIONS
Access-Control-Allow-Credentials: true
Content-Type: text/xml
jeanetteclark commented 9 months ago

Oh thanks for looking into that, the accepted response codes are listed as: [200, 202, 203, 206, 301, 302, 303], so 307 would get failed.

I think a more complete list might be: 200, 202, 203, 206, 301, 302, 303, 307, 308 (adding 307 and 308). Thoughts?

The URLs that are failing for ESS-DIVE showed I think a 302 for me locally, so I'm still not sure why they would be failing

jeanetteclark commented 9 months ago

closing this issue, discussion continued in https://github.com/NCEAS/metadig-checks/issues/437