aws / aws-app-mesh-roadmap

AWS App Mesh is a service mesh that you can use with your microservices to manage service to service communication
Apache License 2.0
347 stars 25 forks source link

[request]: Provide support for Envoy's zone aware routing #94

Open rlafferty opened 5 years ago

rlafferty commented 5 years ago

Tell us about your request Support the ability to use envoy's zone aware routing.

Which integration(s) is this request for? App Mesh and potentially Cloud Map

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard? It would be beneficial to be able to use Envoy's zone aware routing to "preference" backends in the same AZ. This would reduce cross-AZ latency where possible. I believe the expectation is that the zone information would be provided by the service discovery tool - so in this case, Im unsure if Cloud Map already provides/retains that information.

Are you currently working around this issue? Not currently able to, so going "without it"

lavignes commented 5 years ago

Hi @rlafferty. Thanks for opening this issue. This is definitely a worth-while feature to add. I know today that ECS will publish Cloud Map attributes such as AVAILABILITY_ZONE: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/service-discovery.html

So hypothetically today, you could create a virtual nodes for each AZ that matches on the AVAILABILITY_ZONE Cloud Map attribute as a work-around. But that is obviously not ideal.

We could specify the AZ by querying the EC2 metadata endpoint and pass it along to Envoy's bootstrap locality: https://www.envoyproxy.io/docs/envoy/latest/api-v2/api/v2/core/base.proto#envoy-api-msg-core-node

Then we can query for zonal instances as an opt-in parameter in VirtualNode service discovery.. Not sure exactly how this should look yet, but we can start thinking about it.

joeykhashab commented 4 years ago

@lavignes I am currently experimenting with the work around that you were suggesting. I am still a little unsure about how the zone-specific logic would work, when I setup my zone specific virtual nodes I still need to setup zone specific virtual services, virtual routers, and routes to them correct?

Once I do that, I guess I need to create a custom envoy docker image to handle the logic of "if in zone A, route the request using the zone A virtual router" right?

lavignes commented 4 years ago

Hi @joeykhashab

I still need to setup zone specific virtual services, virtual routers, and routes to them correct?

For now, yes unfortunately. Hence why this issue exists :(

In the route, you could specify two virtual nodes weighted targets: one with an AVAILABILITY_ZONE Cloud Map attribute and another without. Then you could weight the traffic primarily towards the first to gain some fail-over capabilities if the preferred AZ is seeing an outage.

Once I do that, I guess I need to create a custom envoy docker image to handle the logic of "if in zone A, route the request using the zone A virtual router" right?

That would not be necessary. You'd simply configure the backends of your virtual nodes to use the zone-specific virtual service. The Envoy image contains just some information about reaching our management server to download configuration. So you aren't able to target any specific virtual routers etc. It is all fetched at run-time.

All that said, I wouldn't recommend doing this manually. It would require creating a lot of duplicate mesh resources. My example above is more hypothetical as it does allow for routing to a specific AZ via service discovery, but there is no nice mechanism for handling fail-over.

joeykhashab commented 4 years ago

I see. OK I think I am understanding the work around better now. Thanks for your response @lavignes.

claydanford commented 4 years ago

Any update on this after 8 months?

inakianduaga commented 4 years ago

How realistic is it that this will be implemented in the next 6 to 12 months? We are currently considering AppMesh to potentially remove our ALB LCU costs by dropping ALBs for all internal traffic, but it feels that won't help if we get hit with CrossAZ costs for the traffic

Rob-Johnson commented 3 years ago

just adding a +1 for wanting to see this functionality. From what I can see, the method involving cloudmap attributes won't provide the same capabilities that envoy has natively, particularly making sure that the amount of traffic sent within the same az is kept proportionate with the capacity of the upstream cluster.

james-skinner-deltatre commented 2 years ago

Any update on this one?

We are also looking to replace an internal ALB and if I understand the billing docs correctly (which is not always easy)

  1. If you use an ALB to communicate between AZs you dot not pay for cross AZ data transfer, only ALB LCU cost ($0.008/GB)
  2. If you use App Mesh do make the same call, you pay for the corss AZ data transfer costs ($0.02/GB), plus the CPU to run the Envoy sidecars

If this is correct, and I am trying to get support to confirm, a switch to App Mesh will cost us a lot more if we cannot do some zone aware routing.

james-skinner-deltatre commented 2 years ago

For anyone interested, I got confirmation from support on pricing:

  1. If I transfer 1GB of data from EC2 instance A to EC2 instance B which are in the same VPC and the same region but different AZs then I pay 0.1 + 0.1 = $0.2 for data transfer.
  1. If I then create an internal ALB in the same VPC and add EC2 instance B to be behind it, then transfer another 1GB from EC2 instance A to EC2 instance B, this time via the ALB, I do not pay the same $0.2 data transfer costs, only the ALB pricing as per https://aws.amazon.com/elasticloadbalancing/pricing/
  • Yes, That is correct

Doing some rough calculations on an existing ALB we have - if we replaced it with App Mesh we would go from paying $40/week in ALB costs to $630/week in cross-AZ data transfer costs, plus the added cost to run the envoy sidecars.

Clear deal breaker for us, which this feature would mitigate I hope.

UPDATE: Data transfer cost is $0.02/GB not $0.2 so it works out at ~$60 for App Mesh vs ~$40 for ALB

herrhound commented 2 years ago

@james-skinner-deltatre, I feel your calculations are not exactly correct. First, the cross-AZ data transfer cost in most of the AWS Regions in the US is $0.01, ten times less than you assumed. Second, in most of the cases, you send no more than two thirds of the traffic to other AZs. Assuming your traffic is ~450 GB per day, that would result in ~$42 of data transfer charges per week, comparable with your current ALB cost.

The added cost of running Envoys is probably a larger concern. We recommend allocating 512 CPU units (0.5 vCPU) and 64 MiB of memory to the Envoy container, which results in additional cost of $0.02 per hour per Envoy on Fargate, or less on ECS. As always, your decision to use a service mesh should be guided by the benefits that you can get from using it.

Speaking of the AZ-aware routing capability, it is high on our roadmap. However, at the moment we can't share a specific timeframe for the release.

james-skinner-deltatre commented 2 years ago

@herrhound you're absolutely right, I dropped a 0 at some point. In this case it works out at $63/week which is much more acceptable (I did in fact factor in the two thirds)

The Envoy cost is a concern as we tend to run many small tasks, so it adds up.

This is the wrong thread for it but at the moment I am also seeing request latency increasing on App Mesh vs using an ALB which is not what I expected, but need to check I'm not off on the maths here too :-)

herrhound commented 2 years ago

Hey James @james-skinner-deltatre, please ping me on our Slack channel with the details on increased App Mesh latency: https://awsappmesh.slack.com/archives/D011Z89UH1B

kgns commented 1 year ago

Speaking of the AZ-aware routing capability, it is high on our roadmap. However, at the moment we can't share a specific timeframe for the release.

hi @herrhound, are there any updates on this feature request? is it still in researching stage?