gigascience / gigadb-website

Source code for running GigaDB
http://gigadb.org
GNU General Public License v3.0
9 stars 15 forks source link

Setup Gitlab pipeline and provisioning on new Upstream project to enable two production infrastructures #1996

Open rija opened 3 months ago

rija commented 3 months ago

Pull request for issue:

This is a pull request for the following functionalities:

How to test?

Deployment to your AWS staging environment

After checking out this PR's branch, pushing it to your Gitlab pipeline, you can follow the "Setting up your Staging environment" section of the updated docs/SETUP_PROVISIONING.md document.

Note: the main change is the addition of a new playbook for database loading and CLI tools installation which fix the circular dependency problem.

Deploying to hot standby from Upstream's alt-gigadb-website pipeline

The hot standby infrastructure is already built, but you can test re-provisioning and deploying to it:

Follow the instructions specific to the upstream/alt-gigadb-website project in docs/sop/PROVISIONING_PRODUCTION.md for the staging environment.

Then follow the instructions in docs/RELEASE_PROCESS.md to create a fake release (choose a tag label that's obviously fake e.g: v00-rija-testing), that's going to be deployed live only to the upstream/alt-gigadb-website project, and not to our current production.

Once that release has been deployed to staging, you can resume the instructions in docs/sop/PROVISIONING_PRODUCTION.md from the "Provisioning live for upstream/alt-gigadb-website" section.

When guided to the blue/green deployment process, consider your fake release as a simple one (no infrastructure change and no database change): You can deploy to the Hot stand-by (currently the upstream/alt-gigadb-website) by following the instructions in "Deployment to a specific live environment" section fromdocs/sop/PROVISIONING_PRODUCTION.md

If everything goes well, you should be able to play with the new infrastructure:

  1. When both pipelines are successful, navigate to the staging urls (check versions in footer, should match your fake release)

    1. https://staging.gigadb.org
    2. https://alt-staging.gigadb.org
  2. Test you can connect to the bastion server for both staging environments with the centos user using the SSH key from the first two steps:

    1. centos@bastion-stg.gigadb.host
    2. centos@bastion.alt-staging.gigadb.host
  3. Test the deployment to live on Hot stand-by

    1. Navigate to https://alt-live.gigadb.org and check version in footer
    2. Connect with SSH to centos@bastion.alt-live.gigadb.host
    3. Run the users_playbook.yml for the exact same username you already have on upstream/gigadb-website, you should then notice that you can ssh with that user using the same private key (because the public key is already in Gitlab variables)
    4. While connected to bastion.alt-live.gigadb.host, you can access /share/dropbox, and its content should be the same as what's on upstream/gigadb-website's EFS.
    5. While connected as centos user, exec crontab -l and notice that:
      • Database will be reset every day to be in sync with upstream/gigadb-website
      • It is not making and uploading backups to S3

blue/green deployment switchover

The last part of docs/sop/DEPLOYING_TO_PRODUCTION.md described the proposed plan for doing the blue/green deployment. Feel free to comment.

Changes to composer.json

composer.json is now a regular file, that's versioned and manually editable, so that automated dependencies security checks can be performed. After checking out this branch, you should be able to execute ./up.sh as usual and everything should work as usual.

Addition of more basic tools

The new ops/infrastructure/data_cliapp_playbook.yml playbook will install:

On your AWS deployment of this branch (from the first section of the "how to test?") , you can check they work by executing the commands below in order on a bastion server:

tree .
tmux
wget https://gigadb.org
tar -cvjSf compressed.tar.bz2 index.html
rm index.html
tar -tf compressed.tar.bz2
tar -xf compressed.tar.bz2
rm compressed.tar.bz2
emacs index.html

C-x C-c to exit emacs

rm index.html

C-d to exit tmux C-d to log off SSH

How have functionalities been implemented?

Blue/green deployment:

See the "Upstream projects" section of docs/sop/DEPLOYING_TO_PRODUCTION.md

Fixing the circular dependency issue:

Move all the bastion playbooks tasks that depend on the building and pulling of docker containers by the Gitlab pipeline into a new playbook ops/infrastructure/data_cliapp_playbook.yml which, unlike the other host configuration playbooks, is to be executed after running the Gitlab pipeline.

Any issues with implementation?

N/a

Any changes to automated tests?

N/a

Any changes to documentation?

Any technical debt repayment?

N/a

Any improvements to CI/CD pipeline?

The ops/infrastructure/bastion_playbook.yml was broken up for two reasons:

  1. Bloat. Therefore, all the curators tools installation steps were moved to ops/infrastructure/data_cliapp_playbook.yml
  2. Circular dependency between pipeline and provisioning: with this change, the steps laid out in docs/SETUP_PROVISIONING.md can be performed in order without ambiguity and clear boundaries.
pli888 commented 2 months ago

The How to test Section in this PR says deployment to live on Hot stand-by should be tested by navigating to https://alt-live.gigadb.org and check version in footer. However, the value for REMOTE_HOME_URL Gitlab variable in upstream/alt-gigadb-website doesn't match:

Project Variable Value Environment
upstream/gigadb-website REMOTE_HOME_URL https://gigadb.org live
upstream/gigadb-website REMOTE_HOME_URL https://staging.gigadb.org staging
upstream/alt-gigadb-website REMOTE_HOME_URL https://alt.gigadb.host live
upstream/alt-gigadb-website REMOTE_HOME_URL https://alt-staging.gigadb.host staging

https://alt.gigadb.host is not reachable in a browser. I can see my v00-pli88-testing deployment on https://alt-staging.gigadb.host though.

rija commented 1 month ago

Hi @pli888,

question: It looks like basic tools (wget, emacs, etc) are still installed by bastion playbook on line 30?

That's deliberate,

The distinction between bastion_playbook.yml and data_cliapp_playbook.yml is mostly based on whether the thing we want to install/do need a container image built by the Gitlab pipelines or not. If yes, it goes in the latter, if no it goes in the former.

The basic tools don't need anything built by the pipeline, they are just "basic" linux tool (like fail2ban, Postgres-client, docker)

Another perspective is whether the thing we want to install is something that would be useful on any EC2 instance that we need to ssh into.

It feels to me like the basic tools would be useful on all our EC2 instances (and maybe we want to add the basic tools Ansible role to the files_playbook.yml to as we occasionally ssh to the files server )

Finally, the other reason to separate the bastion playbook into two playbook is bloat, but since the basic tools are all defined within a single Ansible role, they can never cause bloat.

rija commented 1 month ago

Hi @pli888,

info: Need to create a github ticket to create a command to perform DNS records swap so that we don't have to do it manually.

Done. See #2041