Integrate with Travis or Jenkins for testing and deployment of production server

tschaffter commented 5 years ago

Notes from initial discussion with Kim

prod server:

accessible by about 10 persons, later add a load balancer
build system in
deploy the instance in 1-2 clicks
- CF template must have a stack
- run an EC2 instance that has alreadu Docker
- link github with dockerhub. and dockerfile into github, at time of check and commit, dockerhub will build a new image (integration has to be done manually only once)
we would like travis or jenkins to run the CF script
- travis: everything is public if not configured correctly (secret key!)
make sure that there is no race condition between DockerHub that builds the PHCCP image (triggered when new commit to GitHub) and Jenkis/Travis trying to deploy theis DockerHub image.

tschaffter commented 5 years ago

Hi @kimyen, it would be great if we could work together to enable CI for this project with automatic deployment of the current build. We could start working on this together as soon as we have finished identifying the tasks for the pilot phase (from now to mid-June) and received the green light from Bruce that you could continue supporting this project. I'll create a JIRA ticket at that time for this feature.

jaeddy commented 5 years ago

We're not quite at the point of 'optimizing' our CI build yet, but I came across this article and thought it might come in handy later: https://testdriven.io/blog/faster-ci-builds-with-docker-cache/

tschaffter commented 5 years ago

@lukasz-rakoczy Hi Lukasz, could you provide some guidelines regarding how you would enable CI for this project? We can start first with a high level description and then later decide how to implement it. Here are some information that may be useful:

The project has a .travis.yml that was created when the project was initially generated using generator-angular-fullstack. .travis.yml has not really been updated since the generation of the project, whose code has been deeply modified since. Travis currently reports that tests fail.
The project has Unit and E2E (end 2 end) tests that can be run with npm run test. Running npm run test completes successfully with the current version of the collaboration portal (I usually always run this command manually before making a git commit).
I'm looping in @kimyen who is a software engineer at Sage and has experience with CI too. She is currently not assigned to this project but I'm hoping she can provide guidance, in particular practical advice on how setup CI using Sage's Travis or Jenkins infrastructure.

Thanks!

lukasz-rakoczy commented 5 years ago

@tschaffter Hi Thomas,

I think we have a couple of options here and the approach should be driven by the requirements we want to satisfy. First of all we should choose a tool that we want to use. In my opinion we can consider continuing using Travis CI or switch to Jenkins which is already used in the other streams of the PHC initiative. The second thing to consider is a deployment model which can important when we want to integrate the tool into the deployment process. Deployment however can be tread as a further step and we can focus on starting CI for the portal.

Here are my thoughts on different aspects of the tools and processes:

Travis CI:

This might be the easiest option when want to keep everything (source code, future deployment artifacts) in publicly available repositories (Github, Docker Hub..). If there is a need to switch to Roche infrastructure then it might be tricky to make it working.
Do we have a paid version of the service (trial version allows 100 builds only)? If not we need to get get budget for that.
I've been looking at the errors we get for the builds now and it seems that we should start from using a Chrome addon instead of installing custom apt packages: https://docs.travis-ci.com/user/chrome (because this seems to be a blocker for now). I wanted to experiment with this on a new branch in the repo but I don't have edit access to it.

Jenkins:

We would need to either use the PHC instance of Jenkins (jenkins.science.roche.com) or run our own instance (which could be an overkill for the current project stage).
We need to make sure that Jenkins slave we would use have all required dependencies to run the tests and potentially build and deploy the application (mongo, Docker, Chromium, AWS CLI)

General:

If we decide to go with Jenkins I would recommend to use Jenkins Pipelines instead of configuring plans in Jenkins UI. Pipelines allow to store all CI/CD configuration (as Groovy code or native, declarative Jenkins format) in the same repo as the rest of the code.
We can consider to make building and testing less environment dependent by: mocking test dependencies, move some parts of the process into Docker containers
When the app gets closer to productive version we need to synchronize with Kumar and decide how the deployment process should look like.

In my opinion we should start with bring back the Travis build to live and then improve from there. If you could grant me edit rights to the repo I could try to create a pull request fixing the Travis build.

tschaffter commented 5 years ago

Thanks @lukasz-rakoczy for your comprehensive feedback!

It seems that continuing to use Travis is the most suitable option.

Do we have a paid version of the service

Yes

it seems that we should start from using a Chrome addon instead of installing custom apt packages

Correct. CI tests started failing when I decided to perform e2e test using ChromeHeadless instead of PhantomJS. Soon I found that the best way to use Chrome/ChromeHeadless in test was to use puppeteer.

I have fixed .travis.yml and CI tests run successfully now. I have also added Travis status image to the README.md.

I wanted to experiment with this on a new branch in the repo but I don't have edit access to it. I saw that you forked the repository. It's the way to go, then you can do pull requests that I will review before merging.

What would be the steps that we should take if we want to have the collaboration portal deployed somewhere automatically after the test successfully complete?

lukasz-rakoczy commented 5 years ago

Hi Thomas,

If we want to keep things simple (by still using Docker Compose to run the app) it would be:

After Travis completes its testing a command should be run to build docker image for the portal.
The image should be pushed to a Docker repository (public Docker hub?).

From here it gets a bit more complicated because there are different options:

Use custom SFTP/SSH deployment to transfer the Docker Compose file to the target EC2 instance and then to restart the app.
Use some more generic approach for instance AWS CodeDeploy. However with this tool I don't have any hands-on experience and I would need to investigate if it is a good fit for the app.

Of course things get more complicated when if we want take into account app versioning, multiple environments... Maybe it is better to start with smaller steps and try to create a pipeline that would would deploy code from the develop branch into a develop EC2 instance.

Please let me know if you want me to help you with that.

tschaffter commented 5 years ago

@lukasz-rakoczy Can you take ownership of this task?

We now have a working Travis script. What we need now is a way to automatically deploy the collaboration portal environment when the different services of the collaboration portal are updated? Maybe at the same occasion you could propose a protocol that we would follow to update the portal environment when there will be a production release (e.g. deployment of weekly updates while users are using the portal, backup data, etc.).

lukasz-rakoczy commented 5 years ago

@tschaffter Yes - you can assign this task to me.

Once #126 is fixed we can extend the Travis config to automatically build and push images after changes are pushed to the Github repo (it would be good to agree on some flow so we know what branches are used for different deployment stages; I think that gitflow works fine - https://datasift.github.io/gitflow/IntroducingGitFlow.html).

Then I would try to setup AWS Codedeploy so the new images can be pulled on EC2 instances and the app can run in the updated version.

According to the production release. I think that AWS Codedeply can also be used to automatize steps like backups. However this depends on how the production setup will look like (own mongo instance or managed AWS service?, simple EC2s or Kubernetes).

tschaffter commented 5 years ago

@lukasz-rakoczy @ychae Sage is still trying to re-enable Travis test when commit are submitted to this repos. Sage is a non-profit and it has been confirmed by Travis that we don't need to pay anything to use the paid features.

@lukasz-rakoczy Meanwhile, we can still work on this task. Can you put together a plan of actions to enable auto-deployment of the collaboration portal and its dependencies (e.g. prov-service, etc.)?

lukasz-rakoczy commented 5 years ago

@tschaffter @ychae I can confirm that Travis started to build the project so the licencing issue is fixed.

@tschaffter First of all it would be good to fix all failing tests so Travis can successfully finish the build.

According to what Kumar said on the meeting yesterday we should focus on publishing Docker images for the components so they can be used when deploying the system. Kumar suggested that the final infrastructure for the system is not fixed yet and going beyond publishing the images could be a waste of time.

I can't create branches in the original collaboration portal repo so I forked it here https://github.com/lukasz-rakoczy/PHCCollaborationPortal (you should have access). I modified the Travis file inside so it builds a Docker image with the development version of the component and publishes it to a Docker registry. This configuration requires 3 env. variables to be set in the Travis plan settings:

IMAGE_NAME - name of the image to be used for the component (can be prefixed with docker registry address) REGISTRY_PASS - Docker registry password REGISTRY_USER - Docker registry username

For testing purposes I use my own Docker hub credentials and repository. We need to find a proper place to store the component images. The registry should be private (so images can be only accessed by authorized people) and it should be accessible from Travis. Do you know if Sage has its Docker hub account (on a paid plan) or some other Docker registry accessible from the internet?. If yes then we could use for storing the images. If not we could also use AWS ECR or Google Container Registry for that but in this case we need to setup these services (on either Sage or Roche accounts).

After we have the registry configured we need decide on how the images should be versioned. With the development branch it should not be a problem but for "production" releases this can be more complicated (depending on the requirements).

The same approach can be used for the other system components (prov service...). Once all the images are in the registry they can be used for the system deployments but I think we need to discuss further steps with Kumar.

We also need to make sure that current Dockerfiles are correct. By this I mean that they contain everything required to run the component but do not contain unnecessary artifacts that make the images large.

ychae commented 5 years ago

@lukasz-rakoczy Sage does have a private Docker Hub account. @jaeddy can give you access to the registry that has the image currently so that you can continue to work on this.

tschaffter commented 5 years ago

@lukasz-rakoczy

First of all it would be good to fix all failing tests so Travis can successfully finish the build.

Please use the last commit that successfully passed travis test. In the near future, I'll start pushing update the the master branch, which we will configure to trigger the auto-deployment.

I'm currently swamped with other tasks for this project. Please let me know if you need me to perform a specific action.

lukasz-rakoczy commented 5 years ago

@tschaffter

I could set up the CI/CD pipeline but I need to have access to AWS Account from which the dev instance is running (to configure required roles and users for AWS Codedeploy and Travis), Github (to create pull request with changes to travis.yml) and Docker hub (to configure repo for storing the images).

Currently what is in the private branch I cloned is pushing a build image to my private repo but this can be easily adjusted to the Sage Docker account but to go further I need the permissions above or you need to follow the convention and configure it by your own.

Fixing test errors is not crucial to create the pipeline (for testing purposes test execution can be excluded from the pipeline).

tschaffter commented 5 years ago

@lukasz-rakoczy I did some clean up and fixed the unit tests. 68b2c991fa891c9df46e2ed994a09638771b4a44 passed the tests on Travis.

tschaffter commented 5 years ago

I could set up the CI/CD pipeline but I need to have access to AWS Account from which the dev instance is running (to configure required roles and users for AWS Codedeploy and Travis), Github (to create pull request with changes to travis.yml) and Docker hub (to configure repo for storing the images).

@lukasz-rakoczy Can you provide a description / diagram of the CI/CD pipeline first? I will then instantiate the resources required (EC2, ?).

Currently what is in the private branch I cloned is pushing a build image to my private repo but this can be easily adjusted to the Sage Docker account but to go further I need the permissions above or you need to follow the convention and configure it by your own.

What are the conventions?

Fixing test errors is not crucial to create the pipeline (for testing purposes test execution can be excluded from the pipeline).

Commit 68b2c991fa891c9df46e2ed994a09638771b4a44 is a relatively stable version. This version is currently deployed on http://test.phc.sagesandbox.org.

lukasz-rakoczy commented 5 years ago

@tschaffter

Can you provide a description / diagram of the CI/CD pipeline first? I will then instantiate the resources required (EC2, ?).

I think that we can try one of the following two approaches to set this up:

with Docker registry - https://www.lucidchart.com/documents/view/fe124033-996c-4a72-ba52-abba6504f44e/0
without Docker registry - https://www.lucidchart.com/documents/view/fe02358a-6bac-4ef6-bab4-e5b13c245c55/0

Number 2. is a bit simpler (it does not require Docker registry account) but number 2. is more flexible - when you have images in a registry you can reuse them for different deployments (automated but also manual).

To set this up we need:

private Docker registry (ideally separate for each of the components ) and username/password that the registry can be accessed from (this one is required only for 1.)
technical AWS account so Travis can trigger CodeDeploy plan
AWS Roles so EC2s can run CodeDeploy
AWS CodeDeploy plan configured
AWS CodeDeploy agent running on EC2s that will be used for deployments
Github credentials so the AWS CodeDeploy agents can pull the code from Github

What are the conventions?

Please look at this travis file: https://github.com/lukasz-rakoczy/PHCCollaborationPortal/blob/develop/.travis.yml

If you provide:

$REGISTRY_USER  - Docker registry username
$REGISTRY_PASS  - Docker registry password
$IMAGE_NAME - Docker image name (in my case it is "code4life/phc-cp" because my Docker hub account name is code4life and I created a repo named phc-cp)

environment variables to your Travis plan every commit to the develop branch will push a new version of Docker image to this repo and tag with two tags - :latest and :GIT-commit-hash.

tschaffter commented 5 years ago

@lukasz-rakoczy

Let's go with Number 1 because it's more flexible. We will be using Synapse as a docker repository.

Docker registry

Synapse project: synapse.org/phc_collaboration_portal Docker images pushed to this project must have the prefix docker.synapse.org/syn18489221/, for example docker.synapse.org/syn18489221/phc-collaboration-portal, docker.synapse.org/syn18489221/prov-service.

docker login -u <synapse username> -p <synapse password> docker.synapse.org
docker push docker.synapse.org/syn18489221/phc-collaboration-portal

I have created the Synapse user phccp-autodeploy and have given him write access to this Synapse project.

AWS resources

I have instantiated an EC2 instance to host the deployment agent (ec2-35-164-244-178.us-west-2.compute.amazonaws.com). There have created two accounts: phccp which should be used to setup the agent, and lukasz to connect to the EC2. I'll send you an SSH private key shortly. I have also installed docker. I have added lukasz to the groups sudo and docker. I have added the user phccp to the group docker.

When logged in as phccp, you can do docker login docker.synapse.org and push/pull images from there.

Can you give me instruction to create the AWS roles?

Also, I guess that we would like a CloudFormation script that automatically creates AWS Roles, instantiates the EC2 and setup it. Is this something you would be able to develop?

Thanks!

tschaffter commented 5 years ago

@lukasz-rakoczy Will the autodeploy agent also detect when other services such as prov-service are updated on GitHub and trigger the restart of the stack?

tschaffter commented 5 years ago

Also, I propose to deploy based on the develop branch and keep master for thoroughly tested build.

tschaffter commented 5 years ago

@lukasz-rakoczy I have created the GitHub user phccp-autodeploy and have giving it READ access to https://github.com/Sage-Bionetworks/PHCCollaborationPortal. This idea is to use it on the EC2 to pull the script from GitHub to deploy the full stack. Let me know when you need its password/API key.

lukasz-rakoczy commented 5 years ago

Hi @tschaffter

Unfortunately I was not able to shh into the EC2 machine with the key you provided me but anyway I think there is more to be done to make the pipeline working.

With my own accounts (AWS, Github, Travis) I created a pipeline that works and we can reuse its elements to automatically deploy PHC-CP. Everything I have created is here: https://github.com/lukasz-rakoczy/codedeploy

The idea is to:

We have a separate Github repo that is used for storing things related to deployment (AWS Codedeploy appspec file, Docker compose file for running the system, scripts for restarting the instance running on EC2) which is used by the AWS Codedepoly to execute deployments.
Every component of the system (phc-cp, prov-service) has a Travis configuration that builds a Docker image and pushes it to the Docker registry.
After the image is pushed a prepared AWS Codedeploy deployment is triggered.
The AWS Codedeploy pulls the latest images and restarts the services.

There a couple a things we need to set up:

AWS IAM roles (CloudFormation code for them is in cf/cf-stack.yml in the repo I mentioned)
- EC2 code deploy profile instance
- Codedeploy service role so Codedeploy can operate on EC2s
The EC2 we want to deploy to needs to run in the first created role.
We need to configure Application and Deployment Group in AWS Codedeploy to define what and where needs to be deployed (in this step we need to connect AWS to Github).
Travis file for each component needs to additional steps to build and push images and to trigger the deployment (this one is in the .travis.yml in that repo).
Travis plans for the components need to be fed with variables
- credentials to docker registry to publish the image
- AWS credentials to trigger the deployment

All these steps are not complicated but it would be much easier if we could setup a call go through them together because this ticket based communication takes too long. I think that in 1h we would have everything running. If you have some time today or tomorrow please let me know (by email or setup a meeting in my calendar). I can call in later (10-11AM your time) to have this done.

ychae commented 5 years ago

Hi @lukasz-rakoczy I'll set up a time for a call for all of us to get this sorted. Thanks so much for all the details!

tschaffter commented 5 years ago

@lukasz-rakoczy @ychae I met with Sage IT team and we made good progress:

Travis now successfully build the docker image of the portal and push it to the Docker repository https://www.synapse.org/#!Synapse:syn18489232.
The CloudFormation script written by @lukasz-rakoczy has been executed by Xa. The address of the EC2 that has been instantiated is 18.204.195.149. I can successfully log in into this ec2.
A request has been submitted to create the AWS user "thomas.schaffter+phccp@sagebase.org", which would belong to only one group: AWSCodeDeployFullAccess.

What remains to be done:

[x] Update the Travis script so that it effectively run the tests (currently disabled)
[x] Confirm that the AWS user to use to trigger CodeDeploy has been created
[x] Configure Travis to trigger AWS CodeDeploy
[ ] Configure the EC2 to automatically deploy the portal stack using the latest Docker images (portal, prov-service, etc.).
[ ] Update the auto-deploy pipeline so that the same Travis file 1) run the test independently of the branch used and 2) automatically deploy the stack if on the branch develop and master (we will have ultimately two machines, one serving the master build and another one serving the develop branch).

tschaffter commented 5 years ago

The AWS user with access to AWS CodeDeploy has been created.
Added ACCESS_KEY_ID and SECRET_ACCESS_KEY values to Travis settings

tschaffter commented 5 years ago

[x] fix unit test before moving forward

tschaffter commented 5 years ago

Khai requested changes to the creation of the AWS user, which I accepted. However, this is preventing me to create the CodeDeploy application in AWS. Meeting with Khai in ~30 min.

tschaffter commented 5 years ago

[x] Document the creation of the AWS CodeDeploy user (see below)

tschaffter commented 5 years ago

@lukasz-rakoczy We have created the following user and role to run the CodeDeploy application.

User:

  PhccpServiceUser:
    Type: 'AWS::IAM::User'
  PhccpServiceUserAccessKey:
    Type: 'AWS::IAM::AccessKey'
    Properties:
      UserName: !Ref PhccpServiceUser

Role assumed by PhccpServiceUser:

  CodeDeployServiceRole:
    Type: "AWS::IAM::Role"
    Properties:
      AssumeRolePolicyDocument:
        Version: "2012-10-17"
        Statement:
          -
            Effect: "Allow"
            Principal:
              AWS:
                - !GetAtt PhccpServiceUser.Arn
                - !GetAtt AWSIAMThomasSchaffterUser.Arn
            Action:
              - "sts:AssumeRole"
      Path: "/"
      ManagedPolicyArns:
        - arn:aws:iam::aws:policy/AWSCodeDeployFullAccess

The AWS CodeDeploy application has not yet been created because we haven't identified yet all the permissions required to do so. I may ask Sage engineers to create the CodeDeploy app and deployment group using their elevated privileges as we did for the instantiations of resources using the CloudFormation script https://github.com/Sage-Bionetworks/phccp-autodeploy/blob/master/cf/cf-stack.yml.

@lukasz-rakoczy Would you be able to write a CF script does what you have shown me manually on Thursday, that is the creation of the CodeDeploy app and deployment group configured to work with the other resources that we have created?

tschaffter commented 5 years ago

Closed by mistake

tschaffter commented 5 years ago

Update: @lukasz-rakoczy and I just met.

I'll meet with Sage IT to get sufficient access to deploy manually a CodeDeploy app and deployment group
Lukasz is working on a CF template that creates a CodeDeploy app and deployment group

lukasz-rakoczy commented 5 years ago

Hi @tschaffter,

I've been able to make some progress with configuring the deployment stack.

Here https://github.com/lukasz-rakoczy/codedeploy/blob/master/cf/cf-deploy.yml you can find complete AWS stack (IAM resources, EC2, CodeDeploy resources, S3) which allows to automatically deploy collaboration portal (including all its components). I'm not sure if you will be able to create the stack with your Sage AWS privileges but on my account it is working fine.

I also updated: https://github.com/lukasz-rakoczy/codedeploy/blob/master/.travis.yml so deployment is divided into 2 steps:

First it is to put all artifacts which are required to deploy the entire PHC CP (docker-compose file, appspec.ym and deploy.sh) into S3 bucket
Second is to trigger CodeDeploy deployment The first step will only be required in the CodeDeploy repository (for instance if we change the way PHC CP should be run). And the second one will be used from all CP components after the images are pushed to Docker registry.

I also implemented a little workaround that will allow us to deliver Docker registry password to CodeDeploy agents so they can log into the private Docker registry. It is not perfect because the registry credentials are stored on S3 (private and accessible by only authorized AWS users). The better solution would be to store the secret in AWS SecretsManager but I'm afraid we would have problems with setting this up with your AWS privileges.

I'm not sure if you were able to make any progress with Sage engineers regarding you account privileges but we can set-up a meeting to move this issue forward.

tschaffter commented 4 years ago

We have recently achieved this. Further improvements will be tracked in separate tickets.

Sage-Bionetworks / sagebio-collaboration-portal