USEPA / EPA_Environmental_Dataset_Gateway

U.S. EPA’s Metadata Catalog
https://edg.epa.gov
3 stars 2 forks source link

Check metadata links, save status in database #16

Open torrin47 opened 7 years ago

torrin47 commented 7 years ago

From @torrin47 on March 15, 2017 2:55

This has been a priority wishlist item for quite some time, but it's a challenge to tackle. Metadata records usually contain a number of URLs pointing to various associated resources - download URLs, APIs, websites for more info, etc., and often metadata stewards don't stay on top of changes to the URLs. Metadata stewards would really benefit if the EDG could check links on a periodic basis and store their status in the EDG database so that when they check the metrics page, they could be presented with a list of broken links they need to go fix. There are some very specific, high profile links that are displayed on the details page and submitted to data.gov that are a top priority, but ideally the code would be sufficiently flexible to be able to check any embedded link. Also a challenge is the fact that links may be temporarily broken for any number of reasons - so perhaps broken isn't a binary up/down status but up/down/service interrupted at some point in the last X checks.
It's also possible that this might be less of an issue if esri addresses FGDC service status checker integration, so we should hold off on this issue until we know more about that.

_Copied from original issue: Innovate-Inc/EDGmetadata#88