bitwalker / distillery-aws-example

An example application to go with the AWS guide in the Distillery documentation
Apache License 2.0
52 stars 73 forks source link

Distributed Erlang #2

Open jfrolich opened 6 years ago

jfrolich commented 6 years ago

Nice work! I looked at all the scripts and configuration, but as far as I could see this doesn't support distributed erlang, or am I wrong?

Some other feedback: While it is nice to have a boilerplate for deployment in AWS, and automating stuff as much as possible, this is really a complex example, and I would say inaccessible to 99% people.

Perhaps it would be better for the community to write a guide of the bare bones fundamental stuff that need to be set up in order to deploy to AWS (it's not very complicated). And then have a special section that goes into advanced stuff, like automating the setup of infra and code-deploys, and the advanced secret management that this thing implements.

Not saying that this should be it, or putting out a request. Just that this didn't really help me and I already deployed Elixir in AWS before. Actually I am considering building this with a guide + docker image.

bitwalker commented 6 years ago

@jfrolich Thanks for the feedback!

It supports Distributed Erlang, as the hosts for the application run in the same subnet, so traffic between those hosts is unrestricted. That said, the example application is not built as a clustered application, so we don't really exercise that here, and we're only spinning up one host out of the box, rather than two or more.

While it is nice to have a boilerplate for deployment in AWS, and automating stuff as much as possible, this is really a complex example, and I would say inaccessible to 99% people.

The infrastructure itself is not very complex, it is really standard stuff, however, there is complexity in the automation provided by CloudFormation, and the use of Lambda/Custom Resources to handle bootstrapping the infrastructure with secrets. There are good reasons for these choices though:

While most people new to AWS, or just unexposed to all of the components used in this guide, will not understand it at first contact, I don't believe that is important, nor the point. The first purpose of this guide is to follow along and see what is possible, given that 99% of the guide is automated, the actual time required is very short to go from zero to fully-working example.

The second is to see what a real-world, modern architecture in AWS looks like - for many people, they don't even know where to begin trying to set up something which matches their experience with providers like Heroku. This demonstrates what that looks like, as well as how easy it is to replicate once you've done the hard work of designing the architecture and translating that to CloudFormation.

Lastly, it is expected that readers who want to pursue using this architecture in production, will work backwards to understand each of the components (e.g. CodePipeline, or CloudFormation) at a high level, in order to better understand how things are set up, and how they can go about modifying things to their own tastes.

It's my opinion that too many guides try to "keep it simple", and ultimately fail to show you anything of interest - they leave too much out of the picture. The architecture in this guide contains a lot of the hard stuff, but automates all of it, so you can choose to delve further as you want to tweak some part of the system (say by introducing a manual approval step into the build/deploy pipeline, or by spinning up a clustered application rather than a simple Phoenix app). It's not a "one size fits all" architecture of course, but there is no such thing - the point is that this example demonstrates enough of the hard parts that you could build your own thing from scratch and still have a useful reference for some of the parts in common.

Perhaps it would be better for the community to write a guide of the bare bones fundamental stuff that need to be set up in order to deploy to AWS

These guides do exist already, they just don't have an "official" place in the guides in Distillery's docs. I'm not fundamentally opposed to adding more AWS guides, particularly if they cover a different approach - but I do wonder if the promise of not being complicated really holds true in practice. Even a simple setup with a load balancer, an EC2 host, and a database involves spinning up a VPC, an internet gateway, subnets for those three pieces, plus the network rules. You get to omit the deployment automation part by just having readers scp the release to the EC2 host, but you have to cover Docker builds as well, since that is no longer implicitly handled by the build pipeline. There is also the issue of secrets, even if it is a manual step, you still have to demonstrate it in the guide. By the end, the guide is quite long, and the more steps there are, the more likely the reader is to make a mistake by missing a step, or entering the wrong value somewhere.

... and the advanced secret management that this thing implements.

Should I describe this in more detail in the guide? Or have a section at the end which introduces each of the components used in more detail?

Just that this didn't really help me and I already deployed Elixir in AWS before. Actually I am considering building this with a guide + docker image.

I'm sorry about that :(. Can you tell me what would've made this guide more useful to you, despite the complexity? Perhaps more description of what things are used and why, etc.? I definitely want to make it accessible and useful, while maintaining the architecture as designed, at least for this guide; but it is a first draft (i.e. I wrote the project, then wrote the guide), so I'm sure there is a lot of room for improvement.

jfrolich commented 6 years ago

Thanks for your lengthy reaction! I think my main concert with this is:

Should I describe this in more detail in the guide? Or have a section at the end which introduces each of the components used in more detail?

Yes, because I think some of these components such as secret management can be actually really helpful even outside the Elixir ecosystem.

I'm sorry about that :(. Can you tell me what would've made this guide more useful to you, despite the complexity? Perhaps more description of what things are used and why, etc.? I definitely want to make it accessible and useful, while maintaining the architecture as designed, at least for this guide; but it is a first draft (i.e. I wrote the project, then wrote the guide), so I'm sure there is a lot of room for improvement.

I was looking to get some best practices out of this as somebody who already deploys on AWS. Going through the config files was actually quite interesting, but pretty hard to get insights in this way. So a detailed description instead of code (with why's answered) would benefit a lot of people who might not use this project directly but look for guidance in deploying to AWS using a slightly different approach.

Lastly, Elixir deployment is pretty straightforward in most cases, but because almost all Phoenix projects use channels (the strength of Phoenix), a tricky thing that is often not included in tutorials is distributed Erlang. So it would be great to have it in this project. Most projects would benefit from more than one node (if only for redundancy).

PS: I still think a good generic Dockerfile is probably the easiest starting point for deploying Elixir, so the Elixir stuff is contained into a single abstraction. After that works well, you can take any generic deploy Docker documentation from that point forward. The only tricky extra thing to set up is making sure the nodes discover each other and can communicate with distributed Erlang. But with a good sanctioned method, that shouldn't be too hard. Probably DNS is the easiest to tackle it. I understand that the direction/goal of this project is different.

Thanks for working on this. I really think deployment is the bottleneck for Elixir adoption now.

jfrolich commented 6 years ago

By the way completely missed the new docker guide. Sorry! That resolves some of my feedback :)

I have some improvements (optimizing compilation, distributed erlang), that I can PR there. Thanks!

bitwalker commented 6 years ago

Most of this is a generic AWS deployment strategy. Which is actually really helpful, but I guess a lot of people are coming to deploy Elixir. So it might create an information overload, only a small part is Elixir specific.

The more general focus is actually intentional. I was getting a lot of feedback that just talking about the Elixir component of deployment was not enough, many people felt like they still didn't know how to get started deploying a release, because they didn't have the full picture. As you've found, we've also added guides covering Docker and the like to address things at a smaller scale, but this guide in particular was meant to be a comprehensive look at one approach to deploying to AWS. The smaller guides are there to cover the more specific scopes of building Docker images, etc.

to successfully deploy in production it actually is really important to know what is going on to fix potential problems.

Oh absolutely - the intent of this guide is not to send people off to production with zero idea of how their infrastructure works, though they could theoretically do that. That said, this is true of virtually any infrastructure we could base this guide on - if you don't understand it, when things go wrong you will be unable to fix it. I may need to more prominently state this to make sure it is understood, but I do warn already about just taking this infra as given and running with it.

If people are looking for a really easy deploy I still think Heroku still is a better option.

Depends on your needs of course, but sure, but then they probably aren't reading a "Deploying to AWS" guide, or even using Distillery ;)

The problem with this is that you get everything immediately. Most companies/projects probably benefit from a more evolutionary approach, where they take things step by step. (Perhaps first deployment without a remote build pipeline, and automatic deploys from github).

I'm not sure I'd entirely agree with that. Most companies or projects I work with, want those things, but have no in-house expertise to set it up, or don't have the bandwidth. The evolutionary approach is many times just a side effect of that - either you add stuff piecemeal as you get the time, or you get a much more haphazard evolution where people are learning this stuff in production, throwing things together until they find something that mostly works.

By having a reference which demonstrates all of these pieces tied together, the scope of things you have to spend time learning is much smaller, and much more resource or task-oriented (how do we add a Kinesis stream or S3 buckets, how do we introduce an approval process, how do we add staging and test environments, etc.). It is not necessary to understand the breadth of CloudFormation, or CodePipeline to be productive with this as a baseline - you pick up those things in the course of tweaking it to your own applications, or in the course of evolving it as needs change. It is always preferable to have someone experienced with AWS running the show, but that isn't always the case.

I was looking to get some best practices out of this as somebody who already deploys on AWS. Going through the config files was actually quite interesting, but pretty hard to get insights in this way. So a detailed description instead of code (with why's answered) would benefit a lot of people who might not use this project directly but look for guidance in deploying to AWS using a slightly different approach.

Yeah, looking back I thought it might be distracting, but I think you're right that discussing the components in more detail, at least at the end, would be beneficial.

Lastly, Elixir deployment is pretty straightforward in most cases, but because almost all Phoenix projects use channels (the strength of Phoenix), a tricky thing that is often not included in tutorials is distributed Erlang. So it would be great to have it in this project. Most projects would benefit from more than one node (if only for redundancy).

I think this would be a good addition to what's here already - and would better fit the purpose of the guide as a comprehensive demonstration of deployment. I'll add a TODO to my list to extend the guide to cover distribution.

PS: I still think a good generic Dockerfile is probably the easiest starting point for deploying Elixir, so the Elixir stuff is contained into a single abstraction. After that works well, you can take any generic deploy Docker documentation from that point forward.

In general there has been a lot of pushback against relying on Docker heavily for everything. I chose EC2 over ECS in this guide for that reason - it is more generally applicable. As you've pointed out, if you can use Docker, once you know how to build the image, the rest is basically the same as deploying any other language, so you can follow any guide. The point of covering a non-Docker-based approach is to target those people who want to better understand how they can deploy without containers.

Thanks for working on this. I really think deployment is the bottleneck for Elixir adoption now.

Thanks for the great feedback :) - I agree that deployment is a bottleneck, though I think a lot of that is in perception, so I'm hoping that with 2.x addressing some of the major complaints, and the introduction of some of these guides, we can start to change that perception.

I have some improvements (optimizing compilation, distributed erlang), that I can PR there. Thanks!

That'd be great! PRs are always welcome :)

rlefevre commented 6 years ago

To offer another perspective, I have to deploy my first real-world phoenix app on AWS with an infrastructure close to this example and I want a 100% IaC infrastructure, so this guide and repo have been tremendously useful to me. It most likely saved me days or weeks of research and work to get a first fully automatized deployment working. So thank you very much.

I agree with the OP though that it would have been perfect if the example used distributed elixir as this is a common reason to use erlang/elixir (maybe just a phoenix channel broadcasted between nodes). Also I am starting to add support for it and a few things are not clear to me yet:

It supports Distributed Erlang, as the hosts for the application run in the same subnet, so traffic between those hosts is unrestricted. That said, the example application is not built as a clustered application, so we don't really exercise that here, and we're only spinning up one host out of the box, rather than two or more.

As far as I can tell, the ALB distributes requests between two different AZs with one EC2 instance in each, and their subnets can only communicate with the RDS DB and the ALB, so I'm not sure to understand this quote.

Do this mean that communication should be allowed between the two AZ subnets to allow distributed elixir?

Also there are several options for nodes discovery and I am not sure yet of the pros and cons of each one. I found libcluster_ec2 and peerage_ec2. Are there other solutions?

At least a few pointers/links to help finding a solution would be amazing in the guide.

Thank you again for this work.

rlefevre commented 6 years ago

If it can help others, here is what I did to enable distributed elixir using libcluster with the libcluster_ec2 strategy after having deployed the application.

WARNING: I am new to this, so this may be unoptimized and/or unsecured!

1. Limit erlang ports

In rel/vm.args, I added:

-kernel inet_dist_listen_min 4370
-kernel inet_dist_listen_max 4400

Note that this limits the cluster to 30 nodes at most. You can increase the range if needed provided that you also update the Ingres rule below.

2. Allow erlang communication in the application security group

I added the following Ingres rule in templates/infra.yml:

  # Erlang cluster communication
  # 4369 for epmd
  # 4370-4400 for nodes connection
  SecurityGroupAppIngressErlang:
    Type: AWS::EC2::SecurityGroupIngress
    Description: "Allow erlang cluster nodes to communicate"
    Properties:
      SourceSecurityGroupId: !GetAtt SecurityGroupApp.GroupId
      SourceSecurityGroupOwnerId: !Ref "AWS::AccountId"
      GroupId: !GetAtt SecurityGroupApp.GroupId
      IpProtocol: tcp
      FromPort: '4369'
      ToPort: '4400'

3. Allow DescribeInstances API from EC2 instances for libcluster_ec2:

I added the following policy in the InstanceRole AWS::IAM::Role (in Properties after ManagedPolicyArns)

      Policies:
        - PolicyName: "EC2Policy"
          PolicyDocument:
              Version: '2012-10-17'
              Statement:
                - Effect: "Allow"
                  Action:
                    - "ec2:DescribeInstances"
                  Resource: "*"

4. Add libcluster_ec2 dependency

Add in mix.exs:

{:libcluster_ec2, "~> 0.4"}

5. Configure libcluster

I defined my topology in rel/etc/config.exs (so libcluster is only configured in releases):

config :libcluster,
  topologies: [
    example: [
      strategy: ClusterEC2.Strategy.Tags,
      config: [
        ec2_tagname: "source",
        ec2_tagvalue: "distillery-aws-example",
        app_prefix: "distillery_example"
      ],
    ]
  ]

You should adapt to your application name ($APP_NAME passed to bin/cfn and node name prefix, ie: -name prefix before @ in rel/vm.args) if you changed it.

6. Start the libcluster supervisor (only in releases)

I modified lib/example/application.ex to start the libcluster supervisor when a topology is configured (ie: only in releases):

    children =
      [
        supervisor(Example.Database, []),
        supervisor(Example.Endpoint, [])
      ] ++
        case Application.get_env(:libcluster, :topologies) do
          nil -> []
          topologies -> [{Cluster.Supervisor, [topologies, [name: Example.ClusterSupervisor]]}]
        end

My nodes were then connected, which I was able to confirm by running Node.list inside a remote console.

philihp commented 4 years ago

https://github.com/bitwalker/distillery-aws-example/blob/master/lib/example/application.ex#L8 does not seem to have @rlefevre's supervisor only conditionally added. @bitwalker Is this intentional? Is topologies supposed to be there?

When deploying, the build fails because of this with

==> distillery_example
--
962 | Compiling 17 files (.ex)
963 | warning: variable "topologies" does not exist and is being expanded to "topologies()", please use parentheses to remove the ambiguity or change the variable name
964 | lib/example/application.ex:8: Example.Application.start/2
965 |  
966 |  
967 | == Compilation error in file lib/example/application.ex ==
968 | ** (CompileError) lib/example/application.ex:8: undefined function topologies/0
969 | (elixir) src/elixir_locals.erl:108: :elixir_locals."-ensure_no_undefined_local/3-lc$^0/1-0-"/2
970 | (elixir) src/elixir_locals.erl:108: anonymous fn/3 in :elixir_locals.ensure_no_undefined_local/3
971 | (stdlib) erl_eval.erl:680: :erl_eval.do_apply/6
972 | make: *** [release] Error 1
973 |  
974 | [Container] 2020/03/26 23:16:22 Command did not exit successfully bin/build build exit status 2
975 | [Container] 2020/03/26 23:16:22 Phase complete: BUILD State: FAILED
976 | [Container] 2020/03/26 23:16:22 Phase context status code: COMMAND_EXECUTION_ERROR Message: Error while executing command: bin/build build. Reason: exit status 2