cc-archive / cc-link-checker

Automated link checker for legalcode and license URLs
MIT License
9 stars 13 forks source link

Consider adding a 404 catcher / analyser (feature request) #69

Closed mzeinstra closed 4 years ago

mzeinstra commented 4 years ago

Consider making an additional tool that keeps track of 404s in the creativecommons.org/licences/* domain.

Make a dashboard that keeps a rolling counter for the 404s. Any local (in time) spike would indicate a problem.

If a certain URI has high traffic it indicates that a third party platform has an issue, it would be good to reach out to them. A google search on the string of the URI would provide the platform that is referencing a license with a bad URI.

TimidRobot commented 4 years ago

I really like this idea!

I'm not sure about the best place to implement it, though:

I have captured this idea in our internal documentation and am closing this ticket. Please comment or open new ticket(s) with any specific implementation ideas.

mzeinstra commented 4 years ago

Might be something for the development of the new license infrastructure to include.

  1. have a catch for 404s
  2. add GA tracker
  3. use the GA dashboard using the page title of the catch (e.g. 'Page not found') to track the above idea. You could also make it a GA-event to make it easier to find in the data.

Javascript and tracking blockers are an issue. However we are talking about issues of scale here, if there is a local problem there is bound to be a proportionate amount of people visiting the problematic page with javascript enabled and no blockers for tracking scripts given the amount of visitors.

Same for problems arising from external platforms, if their visitor amounts are large enough to trigger you to reach out to them their visitors will likely have some people with javascript enabled and no blockers for tracking scripts.