cloudfoundry / log-cache

Archived: Now bundled in https://github.com/cloudfoundry/log-cache-release
Apache License 2.0
14 stars 11 forks source link

Use service discovery in scheduler to discover where log-cache nodes are #52

Open jasonkeene opened 6 years ago

jasonkeene commented 6 years ago

This would allow for log-cache instances to come and go and the cluster to adapt dynamically to scaling events, outages, etc.

cf-gitbot commented 6 years ago

We have created an issue in Pivotal Tracker to manage this:

https://www.pivotaltracker.com/story/show/157208717

The labels on this github issue will be updated when the story is started.

poy commented 6 years ago

@jasonkeene How do other systems manage this? Some form of a gossip protocol?

jasonkeene commented 6 years ago

@apoydence I was thinking of doing something similar to what the loggregator agent does.

  1. Accept a DNS name in config.
  2. At runtime query DNS to get A or AAAA records for that name.
  3. Use these IP addresses.
  4. Periodically query DNS to refresh the list of IP addresses.

This would be compatible with a LOT of service discovery systems, including kube-dns and bosh-dns.

Alternatively, you can query for SRV records which allows for port, weighting, transport and service name to be discovered. kube-dns supports this but I am not sure about bosh-dns.

poy commented 6 years ago

@jasonkeene This sounds reasonable.

I would want to do more experiments with multiple schedulers after this is completed to see if there is any thrashing. I suspect each scheduler would be looking at a different subset of log cache nodes and would be instructing them differently.

jasonkeene commented 6 years ago

That is a possibility. We can have an algo that is resistant to sudden changes to prevent nodes dropping out when they come right back. Something like a TTL for a node to stop being scheduled to. This would help with thrashing, allow for nodes that are truely gone to expire, and allow for new nodes to come online.