EGI-Federation / documentation

Sources to build EGI documentation site.
https://docs.egi.eu/
MIT License
13 stars 46 forks source link

Periodic check of links #157

Open enolfc opened 3 years ago

enolfc commented 3 years ago

Short Description of the issue

Links should be periodically checked (specially those external to docs.egi.eu). See https://github.com/EGI-Foundation/fedcloud-catchall-operations/pull/24#discussion_r528575198

Summary of proposed changes

Add periodic action on link checking

brucellino commented 4 months ago

Hi! I might be able to look at this as a quick exercise to get myself familiar. Are you thinking about a cron trigger here to run one of the actions included in the existing workflow, or an outside agent which monitors the pages for missing links and opens issues against them?

enolfc commented 4 months ago

There is already some regular checks of the links, but the result is an email that gets ignored in my inbox. See last execution https://github.com/EGI-Federation/documentation/actions/runs/7948069227. It would be good to open an issue indeed instead of just getting the notification about broke execution

brucellino commented 4 months ago

Hm. 14 minutes for a run. Sound like something you want to do asynch. I don't suppose we deploy this to a staging environment perhaps?

enolfc commented 4 months ago

15 minutes is checking every link on the repo. For PRs, only links within the changed files are checked, so it goes rather fast. Fixing the broken links will also reduce greatly the run (you just avoid hitting the timeouts for the broken links).

brucellino commented 4 months ago

As I understood, there are links outside of this repo, which we do not control -- just plain dead links -- which we need to monitor for right?

So we can have the fast checks run only on the page that has changed -- we can parametrise this and it will run in seconds. But we don't have a trigger for changes which happen outside of the repo, so we need to schedule regular crawls for that (once a week, e.g.).

graph LR
  subgraph PR
    direction TB
    page[page changes] --> check_page[Check links in page] 
  end

  subgraph Cron
    direction TB
    cron[Scheduled check] --> check_all[Check all links in all pages]
  end

Failures on the cron path can open issues. Failures on the PR path break the build.

Sound right?

enolfc commented 4 months ago

yes and this is partially done (missing opening an issue if scheduled check fails) See https://github.com/EGI-Federation/documentation/blob/main/.github/workflows/check-links.yml

brucellino commented 4 months ago

nice. So, todo is:

  1. Open issue if cron fails
  2. only check links in changed file which has changed in PR (Looks like that goes in .github/workflows/build_pr_preview.yml ?).. that sounds like a job for a pre-commit hook -- see #635