NCEAS / metadig-checks

MetaDIG suites and checks for data and metadata improvement and guidance.
Apache License 2.0
8 stars 9 forks source link

resource.URLs.resolvable #422

Closed gothub closed 2 years ago

gothub commented 3 years ago

Description

Any URL provided in the metadata (including abstract, location description, methods, and related references) resolves.

Priority

Issues

Procedure

Extract URLs from abstract, location description, methods, and related references, and check that they are in a valid format and resolve. If unresolvable URLs are found, include up to 3 specific urls that were not resolvable in the check output. Also, the total number of URLs found and the total number that are not resolvable is printed.

The check will meter HTTP head requests if many URLs are found, so as to not overwhelm DataONE MNs or other servers.

ESS-DIVE may provide a list of valid domains to check URLs for. URLs in the metadata that are not in this list would not be checks for resolvability. Note that implementing this feature is dependant on the list being provided.

Requested response for failed check

One or more links provided in the metadata does not resolve correctly.

JEDamerow commented 3 years ago

@gothub I updated this check with the requested information. Is this doable for implementation by March? @vchendrix

gothub commented 3 years ago

@JEDamerow yes, this can be delivered by March.

gothub commented 2 years ago

@JEDamerow @vchendrix I'd like to rename this to resource.URLs.resolvable to avoid confusion with metadata.identifier.resolvable. The URLs being resolved are not for locating the metadata object, but for the resource in general. How does that sound?

vchendrix commented 2 years ago

sounds good +++++++++++++++++++++++++++++++++ Val Hendrix @.*** Lawrence Berkeley National Lab

Mail Stop: 50B-2239 Room: 50B-2258E Phone: (510) 495-2905 Pronouns: she/hers +++++++++++++++++++++++++++++++++

On Wed, Dec 8, 2021 at 3:28 PM Peter Slaughter @.***> wrote:

@JEDamerow https://github.com/JEDamerow @vchendrix https://github.com/vchendrix I'd like to rename this to resource.URLs.resolvable to avoid confusion with metadata.identifier.resolvable. The URLs being resolved are not for locating the metadata object, but for the resource in general. How does that sound?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NCEAS/metadig-checks/issues/422#issuecomment-989310582, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAL4M4IAVFX4DWVHIYU2A6DUP7S2BANCNFSM5GHIVCNQ .

JEDamerow commented 2 years ago

@gothub Can we get some more information about how long this check may take and options so that it does not slow down the overall assessment report? Here are some questions/options that we discussed:

Output in failed assessment report: # of URLs that were not valid and print up to 5 links that were not valid

JEDamerow commented 2 years ago

@gothub We still have questions above to make sure the check does not slow down assessment reports, etc. But, we will not include any list of accepted URLs at this time. This will just check that urls provided resolve.

gothub commented 2 years ago

@JEDamerow In the past, when other checks have been developed, a representative dataset would be tested and timed to come up with a worst case scenario regarding processing time.

If such a dataset isn't known, then an alternative, just for development, is to include in the check output, the number of unique URLs found and the elapsed processing time for the check. Once these are analyzed, they can be removed from the check for production.

gothub commented 2 years ago

Initial revision saved in commit https://github.com/NCEAS/metadig-checks/commit/950442733e16ed1595091bb47b7ef4ab230597d7

gothub commented 2 years ago

@JEDamerow Note that when no URLs are found in the designated metadata fields, the 'SUCCESS' message may be confusing - "No URLs were found in the metadata.". I'm not sure how to improve the wording here.

JEDamerow commented 2 years ago

What about "Not applicable, because no urls were found in the abstract, location description, methods, or related references". Is that too long?

gothub commented 2 years ago

That message is fine. Is it OK for the check to 'PASS', even if they don't have any URLs?

JEDamerow commented 2 years ago

Yes, I think so

On Wed, Feb 2, 2022 at 3:13 PM Peter Slaughter @.***> wrote:

That message is fine. Is it OK for the check to 'PASS', even if they don't have any URLs?

— Reply to this email directly, view it on GitHub https://github.com/NCEAS/metadig-checks/issues/422#issuecomment-1028445353, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALKVICSLBT55M2BFD74VOU3UZG3BLANCNFSM5GHIVCNQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>