balena-io / docs

Documentation for the balenaCloud platform.
https://docs.balena.io/
Apache License 2.0
70 stars 370 forks source link

Automate docs website refresh when external repos are updated (CLI, OS, supervisor, etc) #1525

Closed pdcastro closed 2 years ago

pdcastro commented 3 years ago

When external repo sources are updated, for example the balena CLI cli.markdown file in the balena-cli repo, the docs website is not automatically refreshed. We need to either ping devops engineers (for them to login to Heroku and click some kind of "redeploy app" button), or create a dummy commit in the docs repo that causes a Heroku app redeploy.

Example / steps:

A (naive) solution might be for the Heroku app to have a background task that automatically polls the external source repo for modifications (e.g. HEAD HTTP request), say once an hour, and if any change is detected, calls the existing fetch-external.sh script and reloads the Gatbsby web server (maybe sending a SIGHUP to the server process, not sure if this works).

Reference discussion (private thread): https://www.flowdock.com/app/rulemotion/i-docs/threads/bgTUcq9mXhHG1xTtmNZbwoiDYDz

20k-ultra commented 3 years ago

We could use https://docs.github.com/en/free-pro-team@latest/actions/reference/workflow-syntax-for-github-actions#onpushpull_requestpaths to create a Github action that only triggers on specific file changes such as /docs/*.md and there are existing Deploy to Heroku actions already available. If some other script is required to run I guess we'd setup a lambda that this action could trigger instead.

pdcastro commented 3 years ago

Another thought, for what it is worth, is a one-line cron job (perhaps setup using the existing Dockerfile) that runs ./tools/fetch-external.sh once an hour. Assuming that the web server automatically detects when files are changed on disk (needs confirmation), this would be a very easy fix.

Optionally, to avoid re-downloading files that have not changed, the -z option could be added to the existing curl command lines in fetch-external.sh:

curl -L -o balena-cli.md -z balena-cli.md https://github.com/balena-io/balena-cli/raw/master/doc/cli.markdown

Ref 1: https://superuser.com/a/908523 Ref 2: https://linux.die.net/man/1/curl

pdcastro commented 3 years ago

and there are existing Deploy to Heroku actions already available

I wonder if there is user-noticeable downtime during Heroku app redeployment. The shorter the downtime the better, of course.

20k-ultra commented 3 years ago

Well I'd think the deployment this button makes is the same as how we deploy new docs now which I assume has zero downtime. I really doubt Heroku has downtime between deployments, they're all just instances of your app and Heroku probably just points your Heroku provided domain to different instances.

20k-ultra commented 3 years ago

Are you hesitant to use Github actions and Heroku deploy because we then rely on 2 external entities ? It seems like a really straight simple / low risk in error approach compared to us doing it ourselves.

pdcastro commented 3 years ago

Are you hesitant to use Github actions and Heroku deploy [...]

No, I am not! :-) My only concern was any noticeable downtime whenever any of the 18 external markdown files are updated.

because we then rely on 2 external entities ?

I hadn't even considered this, but now that you mentioned it, :-) is a GitHub action created for each markdown file, or for each external repo?... (I don't remember how they work.) If I counted correctly, fetch-external.sh pulls 18 markdown files across 13 external repos. Setting up and keeping track of that many GitHub actions in multiple repos could be less convenient (from a maintenance point of view) than having all the logic in a script in the docs repo -- especially if we had to ask temporary permissions from the ops team for each of the external repos in order to setup or modify the actions.

20k-ultra commented 3 years ago

Good point about setting up and maintaining actions for so many repos. The 3 scenarios I can think of are:

Moving all the project docs in this single repo has one immediate downside. Imagine we modify the CLI, Supervisor, SDK, etc docs and have unit tests that check if the docs match what happens given an action. If we moved the docs of these repos into this master doc repo then it would be harder to do that. I like the ability to run unit tests against docs because keeping docs up to date is important.

This also must be a problem other orgs have solved so I'll see what they have done.

vipulgupta2048 commented 3 years ago

@20k-ultra Moving all project docs will probably move us away from the goal that we are setting for ourselves of Merge means deploy. Since a contributor will then be needed to make 2 PR's in 2 different repos whenever they make a new customer-facing feature. That is already the case with our API documentation which trails behind the current API implementation.

I am actually curious if the new CI implementation that we are building might help trigger an update in the docs repo when something changes over on external repos. Will ask around as well.

pdcastro commented 3 years ago

Poll repos for changes and run fetch-external.sh if there's an update

@20k-ultra, the variation I mentioned in https://github.com/balena-io/docs/issues/1525#issuecomment-734473254 is an improvement over that. It is down to adding fetch-external.sh as a one-line cron job entry (which is probably one or two lines in the Dockerfile). Efficiency is also improved by adding the -z option to the existing curl command lines, so files are only re-downloaded if they have changed, and curl does the checking for us. It needs testing, but I'd argue that this competes with the GitHub actions in being "easiest and should be reliable".

This is turning into a best solution competition. 😁

I think the GitHub actions have other advantages and disadvantages too:

20k-ultra commented 3 years ago

I've learned that you can set a github action (workflow) to trigger like a cron job! That should be sufficient to make a curl request to heroku to trigger the deploy. Super easy and will ensure docs are updated at least once a day or whatever frequency we choose. It's not instantaneous as on PR merge but that might be excessive tbh.

https://docs.github.com/en/actions/reference/events-that-trigger-workflows#scheduled-events

example heroku deploy action that I'm not saying to use but showing they exist https://github.com/marketplace/actions/deploy-to-heroku

vipulgupta2048 commented 3 years ago

@20k-ultra Let's do it. Our link checker works exactly in the same way and I think once a day would be good enough. Refer to: https://github.com/balena-io/docs/pull/1690/files

klutchell commented 3 years ago

Just found this thread now, and I want to throw an idea out there. The daily heroku deploy would work but it's pretty clunky, and as a new contributor I wouldn't understand how/when my changes to the external repos would be reflected in the docs.

What if we change the workflow a bit and instead of calling fetch-external.sh on deploy, it's done by a GH action that commits the fetched files and opens a PR for us. This way we can automatically get a new PR whenever an external project has changes to the markdown files/content?

  1. hourly GH action calls fetch-external.sh
  2. commits changes to a branch, usually no changes
  3. if any changes detected, open a PR for us to review and merge

This has the following benefits:

It's a bit more work on the docs maintainers to approve the PRs but if we can handle dependabot and it's nonsense we can handle this?

20k-ultra commented 3 years ago

the external files are source controlled in this repo

seems like the same as them being version controlled in their own repo but I get this allows for the 2nd benefit: the external files can be tied to a specific version of docs, not just whatever was latest at deploy time

no additional PR creation required by the external repo contributor

what's this one ? if I make a change to a doc in the Supervisor then we just have to redeploy these docs to get them published.

I think https://github.com/balena-io/docs/pull/1993 will suffice to ensure the docs are up to date when external resources are updated.

I want to mention I think the ultimate goal is each repository (external resource) implements a contract with documentation and then the docs repo reads this contract to populate a template file (html with contract values referenced). This way the external resources don't have to think about styling or how the docs references other links etc. We could do this 1 external resource at a time too. I'm mentioning this because we'll get all the benefits just describe by Kyle. Files are VCed here and follow a docs final version. If we make a contracts-watcher action to check a list of contracts for changes then it can trigger the build.

klutchell commented 3 years ago

no additional PR creation required by the external repo contributor

What I mean by this is if I'm on the labs team and I make a change to a balenalabs project, a corresponding PR will automatically be opened in the docs to update the corresponding labs tables, without me even knowing it. Then it's up to the docs team to approve the PR and the deploy will happen.

So if you make a change to a doc in the supervisor, a PR will be opened in docs for you. Thus no additional PR creation required by the external contributor.

Though now that I think about it, the process of rejecting a docs PR would get kinda wonky cause a new one would be opened every hour as the external docs differences would persist. In fact ignoring a PR would cause new ones to be opened unless we had a way to check for that.

vipulgupta2048 commented 2 years ago

@klutchell Thanks for your suggestion, the files fetched by fetch-external.sh are ignored by .gitignore so they are never actually committed to the docs repository. The logic behind this was to protect ourselves from making changes at 2 places and forgetting to sync them. Example: If CLI docs were committed to docs repo and I want to fix something. I might fix that in the docs/balena-cli.md instead of balena-cli/docs/balena-cli.md which leads all kinds of trouble.

The solution needs to be build-time and polling the repos to check for changes doesn't seem like a viable option. Hence, my solution to deploy every X days/week

vipulgupta2048 commented 2 years ago

Concluding discussion: https://www.flowdock.com/app/rulemotion/i-docs/threads/uUg_dan9e0xokAKronLSnWkQFV8