fabiolb / fabio

Consul Load-Balancing made simple
https://fabiolb.net
MIT License
7.28k stars 616 forks source link

Externalize registry backends #46

Open composer22 opened 8 years ago

composer22 commented 8 years ago

Fabio is heavily reliant on Consul for service registration, discovery and heartbeat checking. This is a PULL situation. It would be nice if Fabio could complete it's RESTful API as a generic registry interface that could provide a PUSH function and remove any internal dependencies. This is outside of API request in the Issues list of Fabio.

How would this work?

Let's go to the PowerPoint:

1) An external registry (whatever it might be) would call the fabio api and register a service. The request would contain all the useful information for the "Add route for service svc from src to dst and assign tags" command but in json format. Included in the json request would be an additional callback attribute string to either a URL or WS. For example: "serviceCheck":"http://192.168.2.7:80/is_service_up/19382493" or "serviceCheck":"ws://192.168.2.7:1234/19382493" or generically "serviceCheck":"http://192.168.2.7:80/health"

Fabio doesn't care too much about the value.

2) Fabio would register this request in it's table. 3) Fabio would ws attach or use the serviceCheck URL to periodically see if the registry or service is still alive. (in case the registry itself went down or the registry failed in it's attempt to issue an unregister call to Fabio) 4) An external registry would call fabio occasionally to deregister a service that was no longer available. The request would contain all the useful information for the route del command. 5) Fabio would deregister this service from the Table.

"route weight" could also be provided as apart of the API.

This would provide a means for any external registry to utilize Fabio instead of having a dependency on only Consul or having to compile in another pkg with the type Backend interface to support an interface. Services themselves might even be able to register to Fabio directly without middleware as long as they met the API requirement.

magiconair commented 8 years ago

I can do that but it misses the point of having a service registry. :)

fabio is conceptually not dependent on consul. The current implementation is and that is a difference. Your services should depend on a service registry for service registration, discovery and centralized health checks. fabio is a mere consumer of that data to simplify your inbound http traffic routing.

If I wanted to do what you propose then I could stop using consul altogether and provide some generic mysql backend. Also, it would make your services dependent on the fabio API and I would have to replicate the behavior of the APIs that already exist. You couldn't use existing libraries for service registries.

My suggestion is to embrace consul instead of fighting it. If you want to use a different service registry backend then that is fine but then we/you/I would have to spend some time implementing it. A friend of mine has implemented one for the Google Compute Platform. It wasn't that difficult.

magiconair commented 8 years ago

Please feel free to comment to keep the discussion going since I am curious why you are suggesting this in the first place but unless I am missing something really fundamental this is not going to happen.

composer22 commented 8 years ago

Fun-damentally, Fabio is still just a consumer. What I am proposing, albeit a bit flawed, is a way for us to register any registry by way of API. You have an interface right now which requires a compilation. If you provided an API for us to wire up an external registry (or multiple ones), this would make it easy for any 3rd party to build their own.

Perhaps I might propose this api strategy

/register - register a registry with Fabio /unregister - unregister a registry with Fabio

The register would give any metadata that Fabio needs to poll from the registry service.

Embracing Consul is only one half the equation since for Docker we need a registrator service as well. This propagates more points of failure in the system. Trying to eliminate this. Could easily provide a etcd solution or a Docker API solution or a proxy to another DNS. Whatever we dream. Could do all these things if you provided an API so we could easily tell Fabio what registry to use. As long as the API is available to us and you, we don't need Fabio to be compiled for every registry need out there. NOR do we have to clone or create any basturdization of your efforts for each one. We can continue to take advantage of your strong solution while plugging in our own registry service whatever it may be. You can continue as is and not provide state (even though you are providing state for each Route/Target).

magiconair commented 8 years ago

The current registry is implemented via an interface which could be externalized. https://github.com/eBay/fabio/blob/master/registry/backend.go contains the interface definition.

OK, I get it - finally :) I'll think about it. Would make integration simpler but deployment more complex since you now have to build at least two components. Fabio and the registry adapter. But I could provide the current consul implementation as a sample implementation.

magiconair commented 8 years ago

Just as a note for myself:

The idea is to decouple the registry backend from fabio itself via an API to make development of registry backends simpler since they don't need to be integrated into fabio itself.

Although, it makes the development of the registry backend simpler fabio no longer is the one binary that you run and it just works. This will increase operational complexity and create another source for potential errors. The benefit of this approach largely depends on how difficult it is to add support for additional registries. I think the idea has merit but there is a hidden cost in additional integration testing. Then users will have to debug both fabio and the registry plugin to determine where the cause of a misbehavior is.

So I'm not sold yet since lowering the operational complexity is one of the key goals of fabio.

composer22 commented 8 years ago

I counter. You've just increased it by forcing only one provider down our throats. Your interface is useless internally. If you add another one we still have a complexity issue of having infrastructure complexity OUTSIDE of Fabio.

I recommend:

Continue to provide Consul built in. No change.

Create a generic registry using your interface.

But you set the API rules. This is a contract which we, as registry providers, must adhere to.

This is NOT hard for us and opens up a whole world of tools.

//// How to register might be nothing more than:

How to unregister:

Fabio callback format (when Fabio calls this endpoint above)

magiconair commented 8 years ago

I am not worried about the implementation since this is rather simple but what this introduces is a dependency problem. It means that this now becomes a binding contract which I must support forever since I have no control over the registries. If I want to support an additional functionality or refactor the API I have to support both APIs and you will have registry plugins which work with either the one or the other API. I also have no leverage of forcing registry providers to upgrade nor do I have any way for users to upgrade.

I would essentially give up the ability to quickly fix bugs or provide workarounds in implementations and so would the users.

magiconair commented 8 years ago

I'm also wondering how many of these service registries are out there. I didn't focus on adding support for them since this wasn't my initial use case but with the internal api and the GCP proof of concept this has shown not to be too difficult.

magiconair commented 8 years ago

I'd also like to make one comment on your language: I didn't force anything down anybodies throat. We use consul internally and that's what I've developed fabio for. I'm open for suggestions but I want the discussion to be solution oriented.

composer22 commented 8 years ago

Your concerns are understood. But I would not be concerned about API refactoring or enhancement. Your functionality is not so dynamic that it should be dynamic. Yes, previous features do need to be continued to be supported but that (at least in my experience) has never prevented an API to continue to respond to new features. You can either simply add additional attributes or features in the current set (w/o disrupting the interface -- for example extra json attributes) or if you have a dramatically different set, simply version them e.g. /v1.0, /v2.0 in the standard way API's do when big changes occur.

But serious, how many api routes do you really anticipate? Usually there are almost hundreds when versioning is ever needed, or the app is very close to "business requirements" where that happens (like a MVC web site) but not pure engineering lb.

Be comfortable, you _aren't _responsible for supporting other providers needs. You set that boundary with the API interface. It should be a black box for us.

How many registry's are there out there? It grows all the time with imagination. Its not the registry behind the scene, but the proxy between fabio and the registry source. That source could be anything. You shouldn't care or worry:

Basic:

Fabio <= api => CustomProxy <= tbd => Registry

Classic:

Fabio <= yourapi => CustomProxy <= network => SkyDNS Fabio <= yourapi => CustomProxy <= api => etcd Fabio <= yourapi => CustomProxy <= network => zookeeper Fabio <= yourapi => CustomProxy <= api => Docker Fabio <= yourapi => CustomProxy <= api => Consul =)

Imagination:

Fabio <= yourapi => CustomProxy <= network => MicrosoftDNS <= app Fabio <= yourapi => CustomProxy <= network => MySQL <= app Fabio <= yourapi => CustomProxy <= internet => CustomProxy <= MaraDNS <= customregistrator Fabio <= yourapi => CustomProxy <= internet => CustomProxy <= KnotDNS <= customregistrator whatever you desire.

composer22 commented 8 years ago

Another idea instead of api

https://github.com/hashicorp/go-plugin

magiconair commented 8 years ago

This could indeed be a better approach than an API since otherwise people will write plugins in all kinds of languages which then pulls in those dependencies.

magiconair commented 8 years ago

The go-plugin approach has the appeal of keeping the deployment simple since you would have to deploy at most two binaries: fabio and the registry plugin. However, testing the plugin would require a fabio instance or a mock that correctly implements that RPC interface. Ideally, the plugins would live as sub-directories in the fabio repo so that you don't have to assemble a working distribution. I'm just wondering how that is any different from the approach that I am taking right now with the only exception that not all dependencies are compiled in.

While the API approach provides great flexibility it will make integration testing, bugfixing and triaging, especially deployment more complex since people will write plugins in other languages which then pull in those dependencies. Stability of fabio then becomes directly related to the stability of that plugin with little option for me to fix it.

So right now I'm leaning towards getting a couple more registry backends into fabio first to see that a) the interface is sufficient and b) integration cost into fabio is high enough to justify a split. If the Docker API is an important part for you why don't you take the plunge of providing an implementation for it?

Adding a generic registry plugin which just externalizes the API is something that can be achieved at any time. Turning that clock back seems a bit harder to do.

magiconair commented 8 years ago

There is one additional aspect which is the KV store. Right now this is used for manual overrides and I've got issue #27 which asks for storing certificates there. Storing this information so that it is available for all fabio instances requires consensus so that they all have the same view of the routing table. In the current setup consul solves that. Once consul becomes optional I would have to provide another alternative for these features. etcd and zookeeper provide alternatives but what about the Docker API or the various DNS services you mentioned? This probably requires decoupling the service discovery from the KV management in the API but still leaves me with the question on how to actually implement it.

composer22 commented 8 years ago

Any features you are dependent on now from Consul should not change. The registry should be a black box as far as Fabio is concerned. I thought I had done enough to highlight this importance.

The proxy layer (or whatever you want to call it. perhaps driver is a better word) is responsible for keeping these dependent features intact. You should not be testing it.

Another example to look at is how docker allows for custom engines.

https://docs.docker.com/engine/extend/plugins_network/

Do you think Docker needs to be responsible for testing third party plugins? I think not. It's the developers responsibility. Why do you keep returning to that idea fixee that you are responsible?? I'm baffled.

The docker interface is something I want to do, but I dont want to compile it into your code, hence why an API of sorts would be a help.

magiconair commented 8 years ago

I am not responsible for the quality of third party plugins but those plugins will make the deployment and more complex and with that fabio itself. The fact that it is a single binary which you just copy and run and don't have to configure is not a coincidence.

I am running a fairly complex setup right now and also maintain a public API which I have to keep backwards compatible. I prefer solutions that have "batteries included". Engineering is about tradeoffs and I'm looking for holes in your argument. In your mind the problem is solved but I just don't see it. The fact that you don't see any downsides but only benefits I something that in my experience doesn't work out.

fabio is not dependent on consul. It is dependent on a registry that provides the addresses of active services and the routes they serve. Because of the automatic nature there should be a way to modify the generated routing table in a consistent and persistent way (overrides).

You and others already mentioned that consul an unnecessary dependency. It is one more piece of infrastructure that has to be installed, maintained, backed up, configured, monitored, scaled, secured, upgraded and understood. So how would the Microsoft DNS registry plugin work if there is no consul and you want to store and maintain overrides for a cluster of fabio instances distributed over a couple of datacenters?

As for the Docker API: From what I can tell this is for a single host setup with multiple containers which can be queried fairly simple via the GET /containers/json call which provides addresses and ports of the containers and ENV vars could contain the paths to be registered. That and some authentication, maybe listen to the event API instead of polling should do the trick and can be implemented with the standard http lib. The problem is only where to store manual overrides but maybe that isn't an issue in a single host deployment.

magiconair commented 8 years ago

After thinking about it for a while I think the simplest thing that could possibly work is along the lines of what you suggest. Fabio could just watch multiple KV entries in consul for changes instead of one for the overrides. By watching /fabio/registry/* and merging them all together you could support any external system through any means since all you have to do is to manage that entry with an HTTP POST request on an already existing API.

e.g.

/fabio/registry/consul
/fabio/registry/static
/fabio/registry/SkyDNS
/fabio/registry/Docker

The benefit of this is that I wouldn't have to think about a consistent persistence layer, authentication, API and so forth since consul already provides this. The current implementation of the consul registry could also use the same mechanism to be consistent with the rest and it could either become a plugin or remain internal. Order would not matter since the registry plugins are expected to issue only route add commands and the route del commands would remain in the manual overrides which would still be appended last. What do you think?

composer22 commented 8 years ago

Sorry, been tied up here with security patching and other fires.

I'm sort of confused myself here so I will first sum up what I was thinking.

Where your argument breaks down for me is that you state you are not dependent on consul. We are mixing things here a bit. The fact that someone has to have another registry compiled into your code to be able to use it indicates there is a dependency issue - on you officiating and making available these alternatives for us. We, in the meantime, wait around. Also, you are linking in third party libraries for interfaces we don't need. That would mean bloat and overhead.

What I'm suggesting is to provide a plugin framework of some sort that is totally abstract and has no dependencies on any other services running via the Fabio Codebase. No consul compiled in drivers. Fabio is a black box to the registry services through a dynamic plug interface; the registry service is a black box to fabio. In fact, I would externalize your Consul registry code into a plugin and create another repo for it.

When I say plugin I mean either API or code. I will let you decide. Its just an external service similar to a database -- say MySQL -- running through JDBC. Only in this case, JDBC is really some fabio specification you came up with that developers must meet in order for fabio to work.

When fabio boots, you tell fabio what plugins to utilize and any config information it might need to utilize the plugin.

K/V overrides are responsible by whatever plugin is being called. Fabio should not care as long as it gets the informatino back. For example, Foo.fabreg might store that in a local yaml file, while Bar.fabreg might utilize redis. Fabio shouldn't care as long as it gets back the info it needs through an API/plugin standard that Fabio is expecting.

composer22 commented 8 years ago

concerning authentication.

Let the configs for both the running registry and fabio contain a simple Bearer like token No need for fancy dancy stuff. We just want two services running close together on the same box anyway.

This is what I would code out. All as Docker images and containers...

Fabio <== api/rpc ==> RegistryPlugConsul <= api => Consul
and
Fabio <== api/rpc ==> RegistryPlugDocker <= api => Docker
                                       ||
                                      <======> Local Map Storage (for K/V request handing from Fabio)

Fabio doesn't care. Just another IP address + port similar to a DB

magiconair commented 8 years ago

This would require that every plugin developer has to solve the problem of the KV store. Then the KV store connection should be a separate API as well. One for consul, file, redis, ...

While this makes it indeed very flexible it would also mean that you now have to install, run, monitor and upgrade three things instead of just one. Maybe written in different languages, requiring different incompatible runtimes. Different combinations of plugins will work while others wont. Failure of any of these components leads to failure of fabio. People less equipped with knowledge of these things will plug things together and this will create support questions which someone has to pick up and respond to. Since fabio sits in the critical path of any application this worries me.

That is one of the reasons I like Go and the Hashicorp approach. Building a single "batteries included" binary which just works. You move the complexity from the deployment into the development. The plugin approach makes it easier for the developers to integrate but harder for the users to use.

Yes, code has to be compiled into fabio and a library might have to be added but the registry code is usually very simple, based on either HTTP, file or DNS and can fit into a couple of lines of code since the systems you are interfacing with already have an API.

However, I do acknowledge that another method for integrating with other registries is beneficial. I think watching multiple endpoints in consul might be a balanced first step since you can then write a plugin for another registry which just pushes the routing table into consul. With that step fabio becomes actually dependent on consul.

magiconair commented 8 years ago

Another thought is that fabio actually only needs a versioned KV store and fabio could provide an API to that to avoid that registry plugins would have to implement different KV store mechanisms. Then the fabio instances only need to be notified when the KV store has changed.

magiconair commented 8 years ago

Also, the KV store is important for custom error pages and certificates for the whole cluster.

magiconair commented 8 years ago

consul as the KV store also provides one additional function which is essential to fabio clusters: change notification. Without that you have to revert to polling.

If I assume that the KV store and the coordination are the critical functions for fabio and that there are any number of providers which push routing information into that store then I could do the following:

fabio provides an API endpoint which allows uploading a routing table. Each routing table has a name, any number of routing tables is possible, all routing tables are merged in no particular order with the manual overrides merged last.

In the first iteration fabio would continue to use consul for the KV store and the coordination but other options are possible. To support any KV store (e.g. redis, MySQL, memcache, ...) I would have to factor out the cluster management (i.e. change notification).

So adding these endpoints together with merging all routing tables should provide the desired functionality.

PUT /routes/<name>
Content-Type: text/plain

route add service ...

DELETE /routes/<name>

This would allow you to write a plugin for the Docker API which just pushes its routing table into fabio whenever it detects that there is an update.

Initially I would support the current config language but support JSON in a subsequent iteration. Then I'll refactor the built-in consul registry to also just push to KV store. But there has to be a leader election so that not all fabio instances of a consul cluster push the same changes to the KV store at the same time.

This different routes should also be visible in the UI.

I could provide a number of "provided" registry plugins which get built together with fabio and which I run from fabio as a child process providing the API endpoint as a startup parameter. This way I still have a single code base, one set of deployables and you still have to start only fabio to get things rolling.

composer22 commented 8 years ago

consul as the KV store also provides one additional function which is essential to fabio clusters: change notification. Without that you have to revert to polling.

I don't see the issue. I have no problem with adding a WS connection with the driver for callbacks as long as you provide a payload spec that Fabio expects. Long polling is also possible but any registry proxy/driver can easily implement a WS.

I'm confused where you are going with this but at least you are going. =D

My only gripe is having any hard boiled/compiled in libraries. As long as a metadata is defined and followed and you use a consistent fast transfer mechanism (such as RPC, WS) to that driver/proxy layer (which handles the key features that Fabio needs e.g. redis, etcd, consul, dockerapi for registry) then it will work.

Ideally, looking at fabio code should be very vanilla without references to any specific implementation of the kv/registry. Simple code.

Then, separate repos for drivers/proxy or whatever the name should be.

e.v.

eBay/fabio - base router eBay/fabio-consul - driver for consul eBay/fabio-docker - driver for docker eBay/fabio-etcd - driver for etcd. etc...

magiconair commented 8 years ago

In the current implementation consul has two completely independent functions:

With that in mind:

To really follow your argument fabio would require two plugins:

Factoring these out as external plugins simplifies the development of external integrations but complicates deployment since now three independent executables are necessary for fabio to function independent of how they actually communicate.

Also, when you run a cluster of multiple fabio instances (as we do in our 16 node clusters) how do they all get notified that the manual overrides were changed by a user or that a 404 page changed? Again, in the current setup consul provides this function by providing a consistent storage and change notification in between the fabio instances.

Ease of deployment is most likely one design goal that the Hashicorp developers had when developing consul and nomad for example. Nomad has integration points but provides most functions out of the box (from the single binary) without the use of external plugins - unlike mesos/marathon which is way more flexible and more difficult to setup and control.

My issue is that I most likely don't have enough bandwidth to build integrations for all possible combinations into a single product even though I find that preferable since the library bloat in Go is minimal. I also don't think that a large plugin eco system is always the preferable solution. (again see Hashicorp)

What I do think though is that it would be sufficient if fabio supports a selection of common KV store backends natively (as in built-in) and provides an external API for service registries. I think supporting consul, etcd, redis and zookeeper as KV stores should provide enough coverage to get started.

composer22 commented 8 years ago

You don't have to provide any plugin except one for Consul. We, out here, can do the rest as we need them. Just create an plugin protocol for us to follow.

Also you don't have to deal with two -- one of kv and one for services. Assume the plugin, whatever we develop will handle both requests. ONE plugin for both.

Just to show you that it's possible, we could do this now without your coding anything. All we have to do is create a MOCK of Consul...make a server that looks like Consul to Fabio but really its just a hash of structures for the KV and using Docker API. Fabio would never know. It would just make Consul calls to a IP/Port/ Consuls API become the standard we implement. It would SPOOF Fabio by making it think it was a Consul instance.

But is this a solution? I don't think so. Removing your internal consul libraries and building a RPC/plugin would be the more mature way, using the Consul commands you are using as a standard to call other things.

I still don't believe you grasp this elegence yet.

composer22 commented 8 years ago

and so... https://traefik.io

if6was9 commented 8 years ago

Reading through this thread, I was thinking of Traefik. It is very agnostic about the underlying config plane.

Fabio seems much more complete, though.

Our distributed config and service registry is converging around etcd. It has certainly crossed my mind to build an etcd/consul gateway to get Fabio to play nice.

I totally understand the author's indifference to this if consul is entrenched with his employer.

We are also looking at building a supervisory daemon around HAProxy as another option. Something like vamp.io but a bit simpler. This would couple a newfangled control plane with a battle tested runtime.

The most interesting thing about all of this is how dynamic (and unsolved) this problem space is. Lots of good stuff out there. No obvious long term winner.

Like what you have built here though. I think that maybe that accounts for the strong opinions in this thread.

Fabio could totally be the load balancer for the next 10 years.

magiconair commented 8 years ago

@if6was9 i haven't given up on that plan but I think i need to provide a KV store solution and Fabio should retain its zero conf approach. This isn't related to eBay using consul. We just happen to pick it. Kubernetes is an obvious target.