deployment architecture

balmas commented 4 years ago

The deployable lexis-cs service is composed of an index.html page, a webpacked javascript library and a set of json data files.

The Alpheios infrastructure current uses a wide variety of deployment strategies and technologies but I would like to begin to converge on a set of guidelines by which we choose one strategy or another, as well as be able to take advantage of latest technology and models for deployment and continuous integration if and when appropriate.

Existing models in the Alpheios infrastructure include:

1) deploy under an Apache2 vhost on an existing EC2 instance, with code pulled from GitHub by puppet (this is the model currently used by the grammars and lexical data files). Deployment of new code (CI) is managed by combination of GitHub release tagging and puppet config.

2) deploy as a docker container proxied by a vhost on an EC2 instance (where the docker container and vhost are both managed by puppet). We use this for the lexicon service. Deployment of new code (CI) is managed by combination of GitHub release tagging and puppet config.

3) as a serverless application under AWS Lambda and API Gateway via the Serverless framework (used by the user word and settings apis). No CI model is in place here currently, deployment is only manual.

Additional options we could consider include:

1) As static files served by S3 fronted by an AWS EBS front-end

2) As a serverless application with static files hosted by S3 and served by a Lambda function and API Gateway

3) As a docker container encapsulating both the static files and a web server, hosted by AWS ECS with an AWS Load Balancer front-end.

Some of the requirements that should guide our decision:

1) Ability to setup a CI pipeline that results in direct deployment from GitHub (A combination of Terraform and GitHub actions might be something to consider here. )

2) Ability to easily setup a dev, test and prod environments

3) Ability to easily rollback deployments if need

4) Ability to affect client-side caching with deployments

5) Limiting dependence upon a specific cloud provider for hosting

6) Ability to scale for increased multiple concurrent requests easily if/when needed

balmas commented 4 years ago

@kirlat and @irina060981 I invite you both to chime in with thoughts on this. The immediate need is to decide how to deploy the lexis-cs service and I'd like to use it as a use-case for starting to make more intentional deployment decisions.

kirlat commented 4 years ago

I think we can start by trying to define things that are simpler to decide first and then drill down to other choices that are affected by those simpler choices. This will help us to eliminate variability in a step by step manner.

First, I think it is evident that we will be implementing a distributed architecture. It might be microservices or might be some larger pieces of functionality or both, but it definitely won't be monolithic do-all servers.

The most modern, and, I think, most beneficial approaches to date are containers or serverless architecture. There is also platform as a service but it's close to serverless, with serverless having its benefits, and I see no reason why would we like to use PaaS for our purposes. So, on my opinion, we can narrow it down to a choice between serverless and containers.

Both have its advantages and drawbacks. Going into details would be a huge topic and there are many analysis on that done already. Making it short, and omitting a lot of details in the process, we can probably state the following.

Serverless is easier to deploy and manage. All is needed is to write the code in most cases. No need to worry about distribution. load balancing, scaling, failover, etc. Very cost effective for functions used rarely (we pay only for the actual time when the code is executing). Very good for simple tasks (like sending an email) because it's much simpler than to run a whole server for a simple purpose.

On the other hand, there is a greater danger of vendor lock-in. AFAIK there are three major providers of serverless solutions: Amazon, Google, and Microsoft. Both have some specifics in their architecture, and if those are actively employed in serverless functions (which can be very beneficial actually) then we won't be able to take the code and move it to the other provider; changes, sometimes pretty complex, may be required in the process.

The other problem is that serverless functions are stateless and if we need to keep state between invocation it has to be stored somewhere. This can slow things down and make them more complex. If when workload is low we pay less for employing serverless functions, we pay more when workload is greater. I also think serverless is better when traffic is irregular while more traditional hosting solutions are more beneficial for more predictable traffic patterns. Also, I believe there is no way to debug serverless functions in production and we can not run them locally.

Containers are more flexible and gives us more control. We can do whatever we want within a containerized architecture. We can run them either locally or remotely. Docker containers are well supported across different platforms AFAIK. We can move them from one platform to the other without changing a single line of code (did not try this in production though). We pay for this, however, with more complexity. We have to take responsibility to configure applications within containers by ourselves (this can be either good or bad thing depending on circumstances).

Containers are becoming very close to serverless now in a way that there are many solutions helping us to take many management responsibilities away from our shoulders (i.e. Amazon ECS and EKS, Google Container Engine).

A lot depends on the tooling, I think. What tools are available to provide deployment/CI management? It is often would make sense, on my opinion, to choose technology depending on what tools are available for it rather than on technology itself (unless cases where technology really matters).

It's really hard to make a choice. But I'm wondering if we need to. What if we can use both under the management of one tool? I did not do a detailed study of possibilities yet, but I'm wondering if it would be possible to:

Use serverless for simpler pieces of functionality. Use serverless for pieces of functionality that either used rarely or may have irregular usage platforms. Use serverless for functions that can be stateless. Use serverless for functions that need to be scaled easily.
Use containers (probably Docker as the most popular variant) for all other tasks.
Use one tool that will manage both serverless and containers (can it be Terraform?)
Try to avoid vendor lock in. Do not use vendor specific functionality in serverless unless absolutely needed to. Try to replace that functionality with something that is non vendor specific.
Its probably better to use a provider agnostic serverless framework.

What do you think about the strategy above? Would it make sense? Would it be possible to implement?

There is a huge overlap between serverless and containers and maybe it's not a bad thing: we could have something implemented as a containerized application and then move it to a Lambda function if we'll find it more beneficial. Or we can move it in other direction if we'll understand it's the right thing to do.

With this we would have three technologies in use: serverless, Docker containers, and the tool that will manage both. This can be a simple yet flexible architecture. What do you think?

kirlat commented 4 years ago

For CI we can probably use GitOps workflow. Terraform mentions supporting it. But even if Terraform support of GitOps cannot do all the things we need we can probably use it via Kubernetes: https://www.weave.works/technologies/gitops/. Kubernetes is open source and it seems to play along with Terraform well (the second screenshot on the front page of Terraform's website shows an example of Kubernetes pod code).

irina060981 commented 4 years ago

As I know we have different type tasks for deployment:

we have big products (already deployed):
- alpheios.net site - it works with Ruby+Middleman - I believe it is used outside of container, as-is on some host
- texts.alpheios.net - it works inside Docker container
- grammar and lexical - EC2 instance + puppet (without container)
- lexicon - EC2 instance + puppet (without container)
serverless products
- API calls to remote wordlist storage (AWS Lambda)

@kirlat and @irina060981 I invite you both to chime in with thoughts on this. The immediate need is to decide how to deploy the lexis-cs service and I'd like to use it as a use-case for starting to make more intentional deployment decisions.

For now we need to decide for lexis-cs service - cedict messaging service with our custom code on js, it is a module for embed-lib and webextension, it is not an outstanding service

@balmas, may be you meant cedict repo - node.js service that we were reviewing on this week?

it is an outstanding service It is written on node.js. I believe we could use AWS Lambda (https://aws.amazon.com/about-aws/whats-new/2019/11/aws-lambda-supports-node-js-12/) but it would need to rewrite the code and it would be service dependent. That's why it is not a good choice (but I know not less about Serverless technology)

So to use container-like technology is more useful here. As I could see we have variants - Docker, Puppet, Terraform and its combination I have very less experience with them too. And after trying to find some comparing - different developers have different opinions.

I have found such an article - it suggests the comparision of Terraform, Ansible and Puppet (with also ability to use with Docker and Kibernetes)

https://logz.io/blog/terraform-vs-ansible-vs-puppet/

As I could se from different sources the choice is really developer-dependent :)

@balmas, as you used all tools what is your opinion?

Do we need all of them? Terraform Docker, Puppet? And AWS Lambda for common API tasks?

balmas commented 4 years ago

I agree with @kirlat that we can and probably should use both serverless and container approaches, depending upon circumstances and that the suggested guidelines are a good starting point:

Use serverless for simpler pieces of functionality. Use serverless for pieces of functionality that either used rarely or may have irregular usage platforms. Use serverless for functions that can be stateless. Use serverless for functions that need to be scaled easily.

Use containers (probably Docker as the most popular variant) for all other tasks.

Use one tool that will manage both serverless and containers (can it be Terraform?)

Try to avoid vendor lock in. Do not use vendor specific functionality in serverless unless absolutely needed to. Try to replace that functionality with something that is non vendor specific.

Its probably better to use a provider agnostic serverless framework.

I think we can also distinguish between existing legacy services, which we can continue to manage as is for now, and new services, which we should try to be more intentional about.

I think the lexis-cs and cedict data (@irina060981 both are relevant here, as lexis-cs is what serves the cedict data) are a good pilot project to help us define the guidelines going forward.

Here are some of the characteristics that I see:

requests to the service will have irregular and unpredictable usage patterns
neither the service code nor the underlying data is likely to change frequently
requests to the service are stateless
service inputs vary depending upon context but the data itself is static
both the service code and the data can be cached on the client and run locally (i.e no server-side functionality)
there is an npm-run build process for the service code (via webpack)
there is an npm-run build process for the data files

@kirlat what am I missing in this list?

kirlat commented 4 years ago

@kirlat what am I missing in this list?

I think you did not miss anything here. In (6) and (7) you've listed that there are build tasks for the packages. Are we planning to run them during deployment? Since we run those build tasks before changes are committed we can just take pre-built version of those files and distribute them without running a build process during deployment. Would this work?

service inputs vary depending upon context

Do you refer to CEDICT request parameters here? We cannot handle those requests on the server side, we need to create an iframe and load a script in there that will listen to messages posted to it. So the sole purpose of the LexisCS/CEDICT server is to serve an HTML file by the URL. This HTML file will be loaded within an iframe and will then pull other resources on its own. Thus the server side must be a simple webserver for static files. The only requirement is that it must be able to provide gzip (or some other) compression.

From the LexisCS/CEDICT requirements that you've listed it seems that probably serverless is the best choice for it. What do you think? Would any serverless platform be able to provide what we need, including compression of files?

kirlat commented 4 years ago

I think we can also distinguish between existing legacy services, which we can continue to manage as is for now, and new services, which we should try to be more intentional about.

If we decide to go down this road I think it would probably make sense to repcack legacy servers to containers, if possible. With this, we will have simpler management and controlled environment (i.e. the exact versions of servers and dependencies that we need), along with some other benefits. And it probably may not be very hard to put them to containers (unless it will require architectural changes). What do you think?

balmas commented 4 years ago

In (6) and (7) you've listed that there are build tasks for the packages. Are we planning to run them during deployment? Since we run those build tasks before changes are committed we can just take pre-built version of those files and distribute them without running a build process during deployment. Would this work?

Probably yes. I think there are some issues with our current local build processes for the various javascript libraries though, because it depends upon what we have installed locally. Eventually it would be nice to be able to make sure production code is built in a more controlled environment.

balmas commented 4 years ago

Do you refer to CEDICT request parameters here? We cannot handle those requests on the server side, we need to create an iframe and load a script in there that will listen to messages posted to it. So the sole purpose of the LexisCS/CEDICT server is to serve an HTML file by the URL. This HTML file will be loaded within an iframe and will then pull other resources on its own. Thus the server side must be a simple webserver for static files. The only requirement is that it must be able to provide gzip (or some other) compression.

Ah yes, I was thinking about the input to the cedict request

balmas commented 4 years ago

This HTML file will be loaded within an iframe and will then pull other resources on its own. Thus the server side must be a simple webserver for static files. The only requirement is that it must be able to provide gzip (or some other) compression....

From the LexisCS/CEDICT requirements that you've listed it seems that probably serverless is the best choice for it. What do you think? Would any serverless platform be able to provide what we need, including compression of files?

Yes I think so too. And I don't think we even need an execution environment, right? So, really for this we're just talking about a CDN I think...

balmas commented 4 years ago

If we decide to go down this road I think it would probably make sense to repcack legacy servers to containers, if possible. With this, we will have simpler management and controlled environment (i.e. the exact versions of servers and dependencies that we need), along with some other benefits. And it probably may not be very hard to put them to containers (unless it will require architectural changes). What do you think?

yes, ideally. We probably need to do this to be able to handle scaling for increased concurrent usage effectively as well.

kirlat commented 4 years ago

And I don't think we even need an execution environment, right? So, really for this we're just talking about a CDN I think...

Yes, I think the CDN it is, then one that will support compression

balmas commented 4 years ago

So we actually have 2 separate CDN deployments here, correct ?

the CEDICT data files
the lexis-cs service

These each will need to be able to be versioned and cached separately but they have interdependencies.

The lexis-cs service references the URLs for both -- it has the targetURL for the iframe in which it is running and the URL at which to retrieve the CEDICT data files.

I think we need to be able to define dev, staging and production urls for the iframe and any resources it serves, outside of the code itself.

And then for any calling code to be able to specify which of these configurations to use.

kirlat commented 4 years ago

Right now CEDICT is located in a subdirectory within a path from which an index page is served. I think it is convenient and makes sense: the purpose of LexisCS is to serve data, and CEDICT is such data. So we can probably follow this pattern in the future. If LexisCS will be serving some lexical data other than CEDICT, we can put that data to the other sibderictory.

CEDICT data and Lexist CS code are parts of the same lexical service, and thus must be treated together. We probably shall synchronize releases of CEDICT and LexisCS as we need to be sure that if CEDICT format changed then LexisCS is able to handle it; that's why we have CEDICT versions hardcoded within the LexisCS config file.

I think it is a great idea to be able to run applications in dev, staging, or production modes. We already have a switch that enables a dev mode of an embedded lib. We can also add other switches that will enable other modes as well. Those switches can also control what logs will be written: a dev mode will produce maximum output while prod mode will write almost no messages at all.

What do you think about that?

balmas commented 4 years ago

So, would the deployment process would be something like this ?

checkout lexis-cs from release tag
examine lexis-cs config to look for included data file releases and checkout corresponding repos
zip it all up and deploy to CDN, specifying tag (e.g. prod, dev, or qa)

kirlat commented 4 years ago

Right now LexisCS does not have a dependency on CEDICT and cedict has to be checked out and copied manually. So right now it looks like:

Checkout lexis-cs
Install lexis-cs dependencies
Build a script by running an npm task
Checkout cedict
Copy the newly built lexis-cs script into a destination folder
Copy cedict to the subfolder within the destination folder.
Zip the destination folder and deploy it to CDN.

But I think for automatic deployment it's better to simplify it. We can add cedict to dependencies of lexis-cs and add an npm task to copy cedict to the destination folder. Then it will become:

Checkout lexis-cs
Install lexis-cs dependencies (it will install a cedict repo too)
Run an npm build task that will build the script and will copy cedict data into a destination folder
Zip the destination folder and deploy it to CDN.

What do you think?

balmas commented 4 years ago

yes I think that would be good. I think we will probably use AWS cloudfront for the CDN and we could use a Travis deploy step to automatically deploy to the AWS S3 bucket if tests pass.

https://docs.travis-ci.com/user/deployment/codedeploy/

We could make the deployment contingent on tests passing.

With this setup we could have every commit to master deploy to a dev bucket (if tests pass).

Would need to investigate still what method to use to deploy to qa and production (we use releases right now for qa and production -- not sure if we can initiate a travis deploy on a release tag or if we would have to switch to branches)

Thoughts?

kirlat commented 4 years ago

I think Travis dpl v1 (or v2) is a good option to deploy the Leixs code. If deployment of specific tags is not possible we could probably create branches for production, staging (if needed), and QA and deploy via those. For staging we could probably deploy code to a separate bucket and direct a limited user traffic to it.

balmas commented 4 years ago

implemented.

alpheios-project / documentation

deployment architecture #19