fabiolb / fabio

Consul Load-Balancing made simple
https://fabiolb.net
MIT License
7.27k stars 616 forks source link

Manual route alias to existing service #114

Open calvinmorrow opened 8 years ago

calvinmorrow commented 8 years ago

I apologize if this functionality exists; I couldn't quite figure out a way to make it happen.

It appears the manual route option requires you to know the specific backend destinations you're sending to; what would be nice is a new option that allowed you to effectively "cname" another service that already existed in the routing configuration.

In essence, we'd like to be able to route "b.example.com" -> existing service with urlprefix-a.example.com.

From what I can tell we can do this with manual rules, however we'd have to enumerate the consul catalog ourselves to determine which endpoints to send to.

magiconair commented 8 years ago

The intended solution for this is to have your existing service announce both a.example.com/ and b.example.com/

magiconair commented 8 years ago

And no worries about asking questions. It points to things that could be improved in the docs.

raben2 commented 8 years ago

@magiconair i have to say you are doing a great job with this project. Thanks!

vhuang01 commented 8 years ago

@raben2 @magiconair definitely agree that Frank is doing a fantastic job with the Fabio project. eBay has noticed the effort that Frank is putting in, and we're thrilled that it's benefiting the community

calvinmorrow commented 8 years ago

Our use case doesn't really fit having the containers announce both URLs at service creation time (which is when the agent would be started). We deploy new versions of containers with web code attached and then put them through regression testing; currently once they pass regression we then assign the public facing names (api.example.com) to the existing container to essentially promote it live.

As described, we would either have to redeploy those containers with the added names, or find a way to modify the service definitions via the consul api to include them. We would also have to remove that name from any other containers/services that used to have them.

All told, we have ~8 names that we move between containers fairly frequently.

magiconair commented 8 years ago

@raben2 Thanks a lot. It is appreciated.

@calvinmorrow ok, I get it. AFAICT, you cannot modify an existing service registration in consul. The API does not have a mechanism for that. But you can re-register a service by first deregistering it and then registering it again.

I still have the feeling that the overrides are the wrong tool for this problem.

How are you currently registering the services in consul?

calvinmorrow commented 8 years ago

We're using Joyent's ContainerPilot (https://github.com/joyent/containerpilot) to setup the service registration and health check.

magiconair commented 8 years ago

What do your names look like? Do they just differ in the host part?

From what I understand, you deploy a new version into your production environment but don't want to make it accessible under the official name. So you give it one or more other names until your tests have passed, e.g. stage1.api.example.com, stage2.api.example.com, ... until at the end you promote to api.example.com. Interesting...

So even if your services would announce all possible names then this wouldn't work since you can't delete a route for a specific instance yet.

If you would generate the routing table yourself you lose all benefits from consul service registration and health checks.

But these are containers, they should stop and start quickly and after the first deployment the image should be cached. How big is the overhead if you just redeploy them with a different tag?

Could you modify the apps so that they register themself and re-register on a certain signal?

Not trying to sidestep the problem but no simple approach comes to mind just yet.

calvinmorrow commented 8 years ago

Your understanding is pretty much correct. We have a couple different apps that use similar processes; the api example I already gave, though our website follows a similar pattern with www being assigned to a fleet of containers after we've gone through our development process and any changes (and possible pre-launch processes) have been executed/verified. Its worked well for that because it gives us an active/inactive configuration style.

Only the host portion of the name changes at the moment.

We're currently using consul-template to create an nginx configuration that 1) creates a service_name -> backend mapping, and then 2) watches a set of keys for the custom mappings; The keys/values are just setup as key=alias value=backend (defined by enumerating the consul catalog in step 1), so the nginx configuration is: server_name <key>; proxy_pass <value>;

You hit the challenges on the head pretty well I think. We have a lot of ways of accomplishing the same thing, but most would require us to use some form of custom glue to marry the catalog with an extra name. Additionally, redeploying to swap names means that we have to bring down old servers at the same time as new ones with a small amount of time for downtime while our application(s) start up.

magiconair commented 8 years ago

I have to think about this for a bit. Some ideas like filtering routes or multiple routing tables came to mind but I need to think about the edge cases and so far none of them worked.

The main goal of fabio is that the load balancer does not have to be configured since the services already know what they serve and they should publish this into the registry. Your deployment approach is in conflict with this since the name of the app depends on some external factor (your passing test) and not on the presence of the application itself.

As for the downtime during container redeployment:

When you are doing the switch to the new version by redeploying them with the correct name you still have the old version running. So you won't have a downtime of the service just a brief moment where both the old and the new version are running at the same time. Once the new deployment is done you would have to shut down the old version anyway. But I get why you would consider this strange since you're only changing the traffic stream.

calvinmorrow commented 8 years ago

I appreciate your patience.

All of our containers have at least one name which we announce at container start time. Regarding having multiple versions running simultaneously, this actually causes problems for us since we version our static files such as CSS and JS, which means that two versions may be serving HTML which references CSS and/or JS which aren't present on half of the containers. That means half of the containers would return 404 if the CSS content changed between deployments.

The "goal" as you stated, of having the load balancer configured based on the service catalog is one of the main reasons we are attracted to Fabio; We also will need to do some routing based on prefixes in the near future which is difficult with our current consul-template solution.

Before opening this issue, we tried to see if we could trick Fabio into performing the correct behavior. Specifically:

Tried using manual routes to do a route add our-deployed-service api.example.com http://our-deployed-service.example.com; This actually routed via DNS and hit our current load balancer (which makes sense) ... but not what we were hoping.

Tried using manual routes to do a route weight our-deployed-service api.example.com weight 1; This didn't work ... according to the docs it does a lookup and attempts to match based on the combination of service and source, where the source (api.example.com) doesn't already exist in our case.

Of course, copying the backend server configuration with a route add our-deployed-service api.example.com http://1.2.3.4:80 works, although it requires us to keep track of the container health outside of Fabio and update the routes.

Possibilities I've thought of: We're using Haproxy currently for initial SSL termination as well as some HTTP traffic policing; We could use consul-template and rewrite the Host HTTP header to the backend server, but this masks the true Host from the backend servers which seems ugly. Besides this, we'd like to reduce the number of existing layers and have one solution do most of the heavy lifting.

We could try to register new services in the Consul Catalog with the correct tags; Unfortunately if a container dies and restarts, it won't be added automatically to the new service. We would have to maintain health and membership ourselves.

We could use consul-template to write manual routes to the backend service with our custom names. This seems like the easiest solution.

Unfortunately my grasp of Go is still pretty limited; What I hoped for originally was that it wouldn't be hard to add support for something similar to a route add out-deployed-service api.example.com, simply adding a new Host: to an existing group of routes.

magiconair commented 8 years ago

Hmm, interesting problem. I still have this nagging feeling that something is off with your deployment/testing procedure. Making fabio do what you want should be possible but I'm not sure if this is the right solution to this problem.

How fast do your containers restart and what load do you have on them (req/sec)? Our Go services usually restart within a second but we also have 1k req/sec per instance.

I still think that restarting them with the right name is the right approach and that you should have a look how to make your application work with this. The swap won't be atomic one way or another so better prepare for it. If you have versioned static files like /static/js/jquery-2.4.js you could also put them under /static/<version>/js/jquery-2.4.js and announce a.com/static/<version> instead of a.com/static. This would at least solve that new pages would find their resources. However, it would also mean that clients would have to download static resources that didn't change.

What if fabio had an option to delete routes by source and tag?

route del svc api.example.com/ tag 'v1'
calvinmorrow commented 8 years ago

Startup time used to be ~5-10 minutes; We've got that down to about 1-2 minutes now. By versioning, I actually mean that the filenames are essentially a hash of the contents (/js/17c616ea4772.js).

Part of the reason we're down this road is we're in that awkward transition between "legacy" and "microservice."

That said, I'll admit that doesn't necessarily mean we're doing it right, and what you're suggesting is definitely more transparent and straight forward. If we didn't have a race between startup of new code and shutdown of old, we could probably fit it into the existing mold. I'll have to think about how to best solve that problem.

magiconair commented 8 years ago

In your setup the service cannot know the name it is going to be accessed by since this is determined by an external event. The only thing it can know are the /path values since you need to have handlers for it. So you would need a mechanism to route traffic for a specific host/path to a set of services which have a certain set of criteria which cannot contain the host name.

I think you would need the following new command:

route add svc host/path tag "v4"

which would then route host/path to all services with tag v4 and which you set in the manual overrides. The service would still register urlprefix-/path but since host/path is more specific it would always be chosen.

magiconair commented 8 years ago

and 1-2 min startup time: wow ...

calvinmorrow commented 8 years ago

I think the route-by-host-and-tag would be helpful in a variety of situations, and as you mentioned, also help with our traffic cutover issue.

Our applications have been going through a lot of redevelopment over the last year or two. It wasn't long ago that 1-2 minute startup would have been impossible, so we're making gains; Unfortunately we've inherited some unfortunate design decisions and we're trying to work around some of them as best as possible. We know where we want to go, its just time and money getting there.