adobe / helix-cli

Command-line tools for developing with AEM
Apache License 2.0
51 stars 62 forks source link

How to copy strains and refer to the current branch in Helix Config? #898

Closed trieloff closed 5 years ago

trieloff commented 5 years ago

Context: @filmaj is building a multi-stage verification workflow for continuous deployment with CircleCI, attempting to have one copy of each strain for each deployable build, which is leading him down the path of running sed on helix-config.yaml. The whole discussion is below:


Fil Maj [3:43 PM]

heh, ok, seems like the publish --only and --exclude feature is working as expected from the perspective of the helix team. unfortunately it does not fulfill the requirements of the devsite team

Fil Maj [3:44 PM]

i will go back to the original purpose / why i brought this up in the first place: how do we enable a classic multiple-environment setup such that there is a staging environment as well as a production environment?

Fil Maj [3:45 PM]

i dont feel confident putting more and more pages, and starting to advertise the site inside and outside the company, without this ability we have pushed bugs to our ‘production’ domain twice now, which would have been caught if we had a staging environment it is not sufficient to rely on the helix simulator for testing

Lars Trieloff [5:49 PM]

Hi @filmaj can we take a look at this? I was hoping you could build a CI flow that would publish only to stage from non-master branches and to all strains from master.

Fil Maj [5:51 PM]

sure, thats fine. the only difference from my desired flow on a merge to master is i would like to once again publish to stage FIRST, re-run end to end tests, then once successful, publish to prod my paranoia around a git merge commit introducing unexpected changes, for example (edited) (has happened to me before)

Lars Trieloff [5:59 PM]

Sounds good. I’m looking at your config now and the first thing that I notice is the sed command in the staging flow. I’d handle this differently, by having all staging strains inherit from a staging-base, which points to the current branch. Then, running hlx deploy will only update the package of the staging strains, and the production strains will stay as they are. Running hlx publish --only "*-staging will publish all strains, but the production strains won’t have any changes from your master, so there isn’t any change. Also, when you get rid of the sed command, you won’t have to do a deploy --dirty anymore Let’s think about the master merge scenario. Ah. I think I got it. Scenario: we are on master. @maj has just merged a PR from @trieloff, which looked good, worked on the branch, but had a big issue that only becomes visible when merged together with some other change that happened to be merged into master first, but wasn’t part of @trieloff’s branch (and PR) we can do a hlx deploy safely, because it will update the package for all strains, but the live site won’t see it anyway. we can’t do a hlx publish yet, because it would mess up our prod site. but we can do a hlx publish --only "*-verify", which takes a third set of strains, (e.g. launch-docs-verify), which have a code property pointing to master and a url matching adobedevsiteverify.helix-demo.xyz. The -verify strains now behave like the -staging strains, except that they are only used for code in master.

Fil Maj [6:17 PM]

… by having all staging strains inherit from a staging-base, which points to the current branch. so then PR authors would need to set this manually

Lars Trieloff [6:18 PM]

hlx publish --only "*-verify" will take the helix-config.yaml from HEAD for all other strains.

Fil Maj [6:18 PM]

(this is where i decided to use sed instead, but im fine with whatever approach)

Lars Trieloff [6:18 PM]

One more thought before I come back to the branches… (edited)

Lars Trieloff [6:19 PM]

I’d probably call the *-verify strains *-stage and the *-stage strains *-dev. Good thing is: strains are free, so we can have as many as we want.

Lars Trieloff [6:20 PM]

PR authors would need to set this manually, but we could as well have a step in the CI that seds it, and then commits it back. The only thing that worries me is the potential for merge conflicts that we get with this.

Fil Maj [6:22 PM]

i feel like, for devsite pull requests, that is an extreme edge case. the devsite devs and contributors are a select few, i review all pull requests and obsessively rebase every PR

Lars Trieloff [6:24 PM]

When the conflicts start to become annoying, we could build something like node-merge-driver that we run on CI to resolve conflicts to helix-config.yaml

Fil Maj [6:24 PM]

that said, its that same obsessive characteristic that wants to have push-to-stage-test-then-push-to-prod in place for continuous deployment off of master :slightly_smiling_face: alrighty. this conversation was helpful. thank you

Lars Trieloff [6:25 PM]

No, that’s absolutely perfect. (edited) We will have enterprise customers, who’s paranoia will make you look sloppy.

Fil Maj [6:26 PM]

i think i actually want PR authors to explicitly set the staging-base strain branch… i could see a conflict arising out of this if, for example, i open a new pull request AND a different pull request gets merged to master at the same time then the two CI jobs may conflict fighting over the staging environment

Lars Trieloff [6:28 PM]

You can decrease the parallelism of your builds Or you can create additional “environments”

Fil Maj [6:28 PM]

yeah additional environments is the dream and is the seed we want to plant with the helix team leading up to the hackathon. wex is my messenger on that one :slightly_smiling_face:

Lars Trieloff [6:29 PM]

I think we can solve this right here.

Fil Maj [6:29 PM]

i tried to describe the end-goal for the devsite in the last comment in https://github.com/adobe/helix-cli/issues/821 (edited) like its not just about PRs to the website.. imagine PRs to content repos as well

Lars Trieloff [6:30 PM]

What do you need in an environment:

  1. a domain name
  2. a strain name

Fil Maj [6:31 PM]

gears are turning :gear: :spinner-spectrum: (edited)

Lars Trieloff [6:31 PM]

You can get a wildcard domain and then have ci-job-112.devsitetest.com

Fil Maj [6:32 PM]

does a wildcard domain count as one origin in fastly?

Lars Trieloff [6:32 PM]

no if you want to proxy www.adobe.io and www.stage.adobe.io, that would count as two origins.

Fil Maj [6:33 PM]

but strains also need to apply to a domain that is listed as an origin in the fastly service, right? (whether they are proxy strains or not)

Lars Trieloff [6:34 PM]

No.

Fil Maj [6:34 PM]

oh

Lars Trieloff [6:34 PM]

Domains are how your site is reachable.

Fil Maj [6:34 PM]

oh right, they need to apply to a DOMAIN listed under the fastly service

Lars Trieloff [6:34 PM]

Origins is where your content is coming from.

Fil Maj [6:34 PM]

derp derp

Lars Trieloff [6:34 PM]

So, one env per CI job may be hard to do.

Fil Maj [6:35 PM]

are there limits on number of domains in the fastly service?

Lars Trieloff [6:35 PM]

(not impossible, but probably not worth trying)

Fil Maj [6:35 PM]

and/or do wildcard domains count as one domain entry?

Lars Trieloff [6:36 PM]

20 domains per service: https://docs.fastly.com/guides/debugging/resource-limits#service-domain-and-origin-limits wildcards count as one domain.

Fil Maj [6:36 PM]

nice! then this should be doable

Lars Trieloff [6:36 PM]

The bottleneck will be merges to master For --only and --exclude to work, master needs to have a list of all strains in all branches. If every CI job would create new strains, you’d end up with lots of merges to master.

Fil Maj [6:37 PM]

as a consumer: thats a weird requirement

Lars Trieloff [6:38 PM]

Git is the only persistence we have. And I doubt that we would come up with a multi-versioned configuration database that beats git.

Fil Maj [6:39 PM]

so pull request automation would need to merge the additional pull-request-specific strains into master’s helix-config.yaml ?

Lars Trieloff [6:39 PM]

So when you do hlx publish --only you can trust that between your own branch and master, you have a full representation of the entire system. Yes.

Fil Maj [6:40 PM]

now i understand why you said

If every CI job would create new strains, you’d end up with lots of merges to master.

Lars Trieloff [6:40 PM]

PR comes in, we have a script that creates strains for the branch/PR, commits the helix-config.yaml and merges into master. Then it does a git fetch and continues with the deployment. It might be more intuitive (and easier to merge) to have one env for each author instead of one for each PR.

Fil Maj [6:42 PM]

mmkay i think ill start with one, proof it, then look to extend to multiple :slightly_smiling_face:

Lars Trieloff [6:44 PM]

Sounds good.

Fil Maj [6:45 PM]

thanks for talking through it with me

Lars Trieloff [6:50 PM]

Thanks for helping us make it real. There’s a big difference between thinking “you should have an unlimited number of environments” and actually building it.

Fil Maj [9:29 PM]

re: the bottleneck to merges to master:

"For --only and --exclude to work, master needs to have a list of all strains in all branches."

if the PR/branch and master have the same number of and names of strains… do we still need to merge helix config changes from the PR into master? like, what if the automation seds an existing yaml reference that exists in both the branch and master from &stagingRepo https://github.com/adobe/developer.adobe.com.git#staging to &stagingRepo https://github.com/adobe/developer.adobe.com.git#$CIRCLE_BRANCH ? when hlx publish --only runs, which would take preference? the PR branch value of the yaml reference or the master value? welp, no harm in messing with it, might as well give it a shot hmmm.. well, this first PR that is introducing these config changes for the first time.. that’ll be challenging syncing with master haha

Fil Maj [10:11 PM]

hmm in the above case i dont think i need to merge back to master… doing the whole sed of the config dance in the PR, running a hlx deploy --dirty, then a hlx publish --only "*dev" yields the following header when i curl the dev site:

< x-strain: xd-docs-dev```
(edited)
pretty sure thats what i want!
and nice, the production URLs report a known-good backend url!
```< x-backend-url: /api/v1/web/developer-adobe-com/7c34f33b5047a8f7ad5847148c49303c8e10514f/html?owner=AdobeXD&repo=plugin-docs&ref=master&path=/README.md&selector=&extension=&strain=xd-docs-production&rootPath=/xd/docs&params=
< x-strain: xd-docs-production

Following the "devops in a box" approach to helix-cli, I'd like to get rid of the sed step in this flow and replace it with a native hlx command.

What I think we need is a hlx clone command with following synopsis:

hlx clone ( --only <pattern> | --exclude <pattern> )? ( --overwrite <jsonpath> <value> )* --name  <expr> --merge?
  1. When running, hlx clone will take all strains from the helix-config.yaml in master that match the --only or --exclude pattern and create an exact clone of the strain (i.e. use of references, etc. should be kept in place)
  2. for each --override it will replace the value of the property matching <jsonpath> with the provided <value>.
  3. it will run a regex-replace on the name of the strain, so that an expression like /(.*)(-prod-)(.*)/$1-stage-$3/ would rename the clone of home-prod-returning to home-stage-returning
  4. when the --merge flag is set, the configuration change should get merged and pushed into master immediately, to avoid issues with concurrent editing
  5. if the merge in (4) fails, because conflicting changes have been made to master in the meantime, the local changes should be discarded, and the process restarted (but no more than two times)
kptdobe commented 5 years ago

Closed during backlog grooming session during hackathon. Reopen if needed or open a new one.