deislabs / osiris

A general purpose, scale-to-zero component for Kubernetes
MIT License
463 stars 51 forks source link

Osiris tries to activate already active service #63

Open tenitski opened 4 years ago

tenitski commented 4 years ago

Bug:

Activator works and scales up the deployment however it looks like Osiris does not register the fact that the deployment is now running and keeps attempting to scale up.

This is what is logged by activator for each request:

I1128 00:55:21.673849       1 request_handling.go:10] Request received for for host MY_DOMAIN_HERE
I1128 00:55:21.673865       1 request_handling.go:19] Deployment MY_SERVICE_HERE in namespace MY_NAMESPACE_HERE may require activation
I1128 00:55:21.673872       1 request_handling.go:51] Found NO activation in-progress for deployment MY_SERVICE_HERE in namespace MY_NAMESPACE_HERE
I1128 00:55:21.679078       1 activating.go:29] Activating deployment MY_SERVICE_HERE in namespace MY_NAMESPACE_HERE
I1128 00:55:21.682330       1 deployment_activation.go:116] App pod with ip 172.27.34.162 is in service
tenitski commented 4 years ago

Oh I think I got it - it is single service to single hostname mapping. If I have many services behind an internal router with ingress pointing to that router service it wont work...

tenitski commented 4 years ago

And if I try to use internal host names like SERVICE_NAME.NAMESPACE_NAME.svc.cluster.local it does not work as Osiris seem to watch only ingress

tenitski commented 4 years ago

So I guess the question is - is it possible to get Osiris to work with ClusterIP services using internal DNS names?

tenitski commented 4 years ago

While internal hostnames are indexed https://github.com/deislabs/osiris/blob/2bf13af2b906f54f75b9309a7f3a9a274fbb1719/pkg/deployments/activator/index.go#L103 Osiris seem to be using external hostname to pull service from the index

krancour commented 4 years ago

As you've discovered by now, the activator has a map where the keys are all the different hostnames (DNS) and IPs by which a service might be addressed to values that are corresponding deployments. Have a look at all the config options that are covered in the README. There are a few different annotations that let you explicitly add hostnames to the map that the activator cannot infer on its own.

tenitski commented 4 years ago

Thanks for getting back to me. I went through all the options 10 times and read half of the source code :). It seems that my problem is that Osiris is not using external hostname when looking up for services in the index:

Say the website is app.example.com Internally I have service router.example.svc.cluster.local which would redirect requests to microservice permissions.example.svc.cluster.local

In the logs I see only

I1128 03:02:16.293979       1 request_handling.go:10] Request received for for host app.example.com
E1128 03:02:16.294004       1 proxy.go:97] Error executing start proxy callback for host "app.example.com": No deployment found for host app.example.com

It does not try to look up a service for permissions.example.svc.cluster.local

tenitski commented 4 years ago

Can you please point out a place where a hostname of a service for a processed request is set?

krancour commented 4 years ago

I think there's a layer of indirection in your example that needs to be explained to me in order for me to help you effectively. You seem to be using some router component to direct traffic. What is Osiris-enabled here? The router or the target? And what is the original request you are making?

tenitski commented 4 years ago

So the request flows is:

Osiris logs requests associated with app.example.com and says that no deployment found for this host. However it does not log requests related to router.example.svc.cluster.local or permissions.example.svc.cluster.local.

This is the annotation I use on permissions service:

metadata:
  annotations:
    osiris.deislabs.io/enabled: "true"
    osiris.deislabs.io/deployment: permissions
    osiris.deislabs.io/ingressHostname: "permissions.example.svc.cluster.local"

This is the annotation on permissions deployment:

metadata:
  name: permissions
  annotations:
    osiris.deislabs.io/enabled: "true"
...
spec:
  template:
    metadata:
      annotations:
        osiris.deislabs.io/enabled: "true"
...
krancour commented 4 years ago

That's a complex bit of indirection... out of curiosity why have a "router" behind an ingress controller? Ostensibly, an ingress controller is a router of sorts. Anyway... let's take the router out of the equation for a moment-- just for the sake of simplifying what I'm about to say-- fewer hops is easier to understand, right?

So pretend you have just your ingress controller and then your permissions service. Any request to app.example.com still looks like a request for app.example.com when it hits the activator. i.e. The host header still says app.example.com. That isn't changed in the request's traversal of the ingress controller.

So... the request hits the activator looking like a request for app.example.com, but per the configuration you posted, that is not a hostname that the activator would know anything about. How would it?

It seems here that you have perhaps misused the ingressHostname annotation, as you have given it a value that you should not need to give it-- a value that the activator can infer all on its own should be mapped to the permissions deployment. If, however, you use that annotation to tell the activator about app.example.com, you'd be adding new information to the activator that would help it match the request with the app.example.com host header.

Now... as for why this flow isn't totally erroring and seems, from what I see in the logs you posted, to be making an earnest attempt to activate, that seems as if it could possibly be a bug. Definitely, the activator shouldn't attempt to do an activation for some deployment it cannot identify and if that is happening, it's a mistake. I'd have to dig into the code more to see if that's actually going on.

There's one other thing lurking in here...

I suspect that you are doing (or intend to do) some path-pased routing. e.g. routing not only on hostname, but also on paths. Is that so? This is not supported (yet?) so that might also be some kind of factor here.

tenitski commented 4 years ago

We use router behind ingress as there is a dozen of microservices with complex routing rules: path based, HTTP methods, feature flags, etc. Router handles it. Also these microservices make calls to each other as part of processing the original request passed by ingress. These calls also go via the router.

So yes, we do have path based routing, however as it is used by a router to resolve path to an internal hostname like permissions.example.svc.cluster.local adding path support to Osiris would not solve our problem.

I'm still not sure how the activator works:

Does activator only listen to the requests coming from outside of the cluster? This would explain why it does not mention any requests to the services which are only available internally.

krancour commented 4 years ago

Does activator only listen to the requests coming from outside of the cluster? This would explain why it does not mention any requests to the services which are only available internally.

If things are configured properly, any Osiris-enabled service that has no endpoints in service (i.e. is scaled to zero) gets activator endpoints automatically added. So any traffic that follows through such a service, regardless of where it came from or how it got there, will like it to the activator. The main question really is one of whether the activator will know what you do with the request and that's going to end up being a matter if 1. configuration and 2. what the host header (or SNI) says.