go-kit / kit

A standard library for microservices.
https://gokit.io
MIT License
26.61k stars 2.43k forks source link

SD: EC2 service discovery #494

Open cam-stitt opened 7 years ago

cam-stitt commented 7 years ago

We could add an EC2 service discovery package. EC2 allows for searching instances via tags. We could again use aws-sdk-go and manage fetching instances via ec2.DescribeInstances.

eticzon commented 7 years ago

Hey @cam-stitt and @peterbourgon! I'd like to get involved. Would you guys mind if I take a crack at this? 🙏

peterbourgon commented 7 years ago

Please do! Before starting in with the code properly, I'd suggest doing a design phase, either just as a narrative in this issue or with a bit of pseudocode.

eticzon commented 7 years ago

Guys apologies this took a while. Actually thinking about it a bit more, in this specific use-case go-kit is performing the role of an ELB (or an ALB). So to get the ball rolling...

The EC2 implementation would generally follow the style of the implementations in package sd.

Implementation Details (assume the new package is called ec2)

  1. There would be an internal method in ec2 for fetching instance host:port (EC2 instance IPs in this case, using aws-sdk-go) which updates sd.Cache. Like what @cam-stitt mentioned, that would mean AWS API calls.

  2. There will be no concrete implementation of the sd.Registrar interface (similar to dnssrv). It is assumed that the EC2 instance tagging (for newly provisioned instances) will be handled by a mechanism such as CloudFormation (which auto-tags instances if they're in an autoscaling group) or by some other means. We merely query the set of instances using a tag.

  3. EC2 instance tag name:value pair would be a constructor parameter.

Questions

  1. Getting port will be a bit tricky because there's no way to get which service is mapped to a given port (in the context of AWS that is). It can be inferred from the list of allowed ingress ports (via the security groups) but that could be error-prone. We could just pass that in as constructor parameter to this new SD. 🤔 WDYT?

  2. Do we need to accomodate for the case where different EC2 instances are tagged the same name:value pair are part of different CloudFormation stacks?

  3. (a follow up from question #1) if port was passed as a constructor parameter, would we need to check if that port is reachable for all instances (i.e. the instance is healthy and there's no security group rule that's blocking access) before adding it to the list of instances.

  4. 💭 In case we get rate limited by the AWS API endpoint, would the best thing to do in this case is return stale data from sd.Cache?

Your thoughts @peterbourgon, @cam-stitt?

peterbourgon commented 7 years ago
  1. I'd expect that users would know the port a priori when specifying a set of EC2 instance tags. Is that reasonable? If you deploy an EC2 AMI, your service is probably listening on a fixed port, right?
  2. I don't know the details here. Can you say more? I would expect users to specify instance tags that would authoritatively map to instances running specific services.
  3. Your call at the implementation level. Most other SD packages don't bother doing health checking, so by default I wouldn't expect it here, either.
  4. Yes, but we should do whatever possible to respect rate limits.
cam-stitt commented 7 years ago

My two cents:

  1. I'm with @peterbourgon, the port will be known by the user and does not have to be part of this. Putting the port on the constructor of the object may make the most sense.
  2. Yup, users should be placing tags (like service: addsvc) on their instances. The CFN stacks should not concern us at all.
  3. We could limit the EC2 discovery to utilise ELB's, or provide that as another style of SD. That way you could just search all instances within an ELB that are healthy.
  4. Yes. There would have to be a polling process for this as AWS api does not push updates.
eticzon commented 7 years ago

Great feedback @peterbourgon, @cam-stitt!

OK, so taking into account your suggestions, here's a rough plan:

  1. port would be a constructor argument.
  2. Users would be expected to tag their instances with service: addsvc (customizable; could be a constructor argument as well i.e. tagName).
  3. I like the idea of using a common ELB and determining instance health from that. That makes health checking the stack user/maintainer's responsibility. The discovery mechanism would not affected by heath check logic changes. I'll take this route then.
  4. So we'll poll AWS to get the latest instance information and apply the usual retry/backoff/jitter in case we get rate limited.

With that I'll get cracking with the implementation. PR coming 🔜 !

wyaeld commented 7 years ago

Just playing devils-advocate during your design phrase, given the functionality AWS has been delivering at the ALB layer (ssl certs, waf, dynamic port mapping of ecs services etc), under what sorts of scenarios would you expect to want to do service discovery at the ec2-instance level, over the alb-fronting-a-service level?

eticzon commented 7 years ago

Hey thanks for the feedback @wyaeld. I've also mentioned this before:

Actually thinking about it a bit more, in this specific use-case go-kit is performing the role of an ELB (or an ALB).

I do agree that the current feature set of an ALB is enough to cover the majority of use cases. I'm happy to drop this task unless @peterbourgon and/or @cam-stitt can think of other scenarios where such a discovery functionality would be useful.

cam-stitt commented 7 years ago

One issue with ALB is that it cannot be used in conjunction with GRPC (from all my tests and research). If you want to use GRPC, you have to still use ELB. Due to the popularity of GRPC in microservices, it seems like this still has a very valid use case.