Using admin endpoint for metrics is insecure

erikschul commented 1 month ago

IIUC, it is common practice to expose /ready and /stats/prometheus using the admin endpoint. But this also exposes admin privileges, like stopping the server. Any compromised Prometheus scraper could stop Envoy services and cause a severe outage (across all scraped Envoy services).

Proposal: Implement configuration for enabling a minimal service that serves only /ready and /stats/* endpoints.

Current workaround:

Configure localhost cluster.

Configure listener with limited routes:

routes:
- match: { prefix: "/stats/prometheus"}
route: { cluster: admin_cluster }
- match: { prefix: "/ready"}
route: { cluster: admin_cluster }
- match: { prefix: "/"}
direct_response:
  status: 404

Kubernetes can poll /ready using localhost, so that's less of a problem, but if /ready needs to be consumed by a downstream service, it would be equally insecure.

phlax commented 1 month ago

cc @jmarantz

wbpcode commented 1 month ago

I think the usual practice is making the admin only listen to localhost and create another listener to control the traffic that could be routed to the admin (Like the workaround way in your descritpion)

I personally think current usual practice is a good way. By that way, you can even expose some unsecure admin interface to specific users/clients by using http filter chain to do the authentication, authorization, etc. Or you can also do a rate limit for the query requests, etc.

So, basically, I don't think this deserve is new configuration or setting. 🤔

wbpcode commented 1 month ago

I removed the help wanted tag first because I think we need more discussion for this feature requirement. :)

erikschul commented 1 month ago

@wbpcode I agree, it seems to be a common practice, and perhaps in that sense a rite of passage, like writing a hello-world program.

phlax commented 1 month ago

im afraid i dont

having to expose the admin endpoint in order to probe readiness or expose metrics is far from ideal

mho is that the underlying issue is that readiness and metrics are not really part of admin and should not be in that interface - but i think that is a more complex conversation and i think we need/want to do something

from a security pov restricting this at the admin endpoint is the better way - ie less complex and not requiring the implementation of something else in order to protect it

this has come up a lot with the examples - most of them require (via docker) some way of knowing that envoy is ready to accept connections - in this case we really dont want to add the admin interface to do such a basic thing - neither do we want to add any workarounds as this adds to the complexity of the example and distracts from what is being exemplified

the add-another-proxy workaround would surely work, but if that is the ~official way of exposing metrics and/or readiness without giving away the keys to the castle then it sorely needs some documentation

phlax commented 1 month ago

previous tickets:

wbpcode commented 1 month ago

mho is that the underlying issue is that readiness and metrics are not really part of admin and should not be in that interface - but i think that is a more complex conversation and i think we need/want to do something

Yeah. And I also agree this is a more complex converation.

from a security pov restricting this at the admin endpoint is the better way - ie less complex and not requiring the implementation of something else in order to protect it

Now our admin has provided lots of different interfaces. If we prepare to do some limit here directly, then, we then need to consider it more comprehensively. Some one may want only expose/ready, other one may wants more, some one may want allow high level operation for trust users or administrator. This will become more complex.

(You may want we can limit only the /ready or /stats*, but it finally is only a hack rather than a feature.)

this has come up a lot with the examples - most of them require (via docker) some way of knowing that envoy is ready to accept connections - in this case we really dont want to add the admin interface to do such a basic thing - neither do we want to add any workarounds as this adds to the complexity of the example and distracts from what is being exemplified

In these case, TCP probes to the listener port may be more sound than /ready. And they acutally could make the admin as public because they are examples only.

the add-another-proxy workaround would surely work, but if that is the ~official way of exposing metrics and/or readiness without giving away the keys to the castle then it sorely needs some documentation

I think the problem is there is no common way for all scenarios. There are lots users only use Envoy as their internal proxy and never expose the envoy to the public network, there are also lots users have no strong concern to the security, a public admin is completely acceptable for these users. Then public admin is good for them.

Of course we can give some suggestions for this, but I think it also is only suggestion, not the ~official way.

phlax commented 1 month ago

 Now our admin has provided lots of different interfaces. If we prepare to do some limit here directly, then, we then need to consider it more comprehensively. Some one may want only expose /ready, other one may wants more, some one may want allow high level operation for trust users or administrator. This will become more complex.

surely this is simple - just provide an allowlist of paths that are exposed - or for a little more complexity a matching allowlist

... And they acutally could make the admin as public because they are examples only.

they neither want to exemplify an insecure setup nor have the distraction of additional listeners to check the daemon is ready

agreed that readiness doesnt necessarily equate to ready to serve something in particular tho

... also lots users have no strong concern to the security, a public admin is completely acceptable for these users. Then public admin is good for them.

im really struggling to think of any situation where this is the case - why would anyone want it so that the interwebs can restart their server or dump their config

I think the problem is there is no common way for all scenarios ...

again - i think this is pretty trivial in terms of configuration, and im guessing implementation too - just limit what paths are exposed

wbpcode commented 1 month ago

im really struggling to think of any situation where this is the case - why would anyone want it so that the interwebs can restart their server or dump their config

proxies in self-hosted private network...?

again - i think this is pretty trivial in terms of configuration, and im guessing implementation too - just limit what paths are exposed

I actually didn't get what you mean by limit what paths are exposed? Did you mean forbid all other interfaces (that not in the list) be accessed by anyone? Or still allow them be accessed by some trust clients?

My assumption is that no any interface will be disabled completely. That means we may even need to add authentication or authorization to the admin endpoint to control the access. So, I think the workaround way would be better because it could re-use the full-featured http filter chain.

phlax commented 1 month ago

proxies in self-hosted private network...?

surely that is just relying on something external (ie network topology) to protect the endpoint

I actually didn't get what you mean by limit what paths are exposed? Did you mean forbid all other interfaces (that not in the list) be accessed by anyone? Or still allow them be accessed by some trust clients?

yes - restrict all access to anything other than the allowed paths - for anything more complex i think that should be done by adding a proxy in front of it

... we may even need to add authentication or authorization to the admin endpoint to control the access ...

i think that is beyond the scope of this issue - the suggestion is to restrict what admin endpoints are served

wbpcode commented 1 month ago

surely that is just relying on something external (ie network topology) to protect the endpoint

Yeah, lots of org use this way, they trust the network Isolation.

wbpcode commented 1 month ago

yes - restrict all access to anything other than the allowed paths - for anything more complex i think that should be done by adding a proxy in front of it

Ok, then this should be easy done by a simple string matcher.

envoyproxy / envoy

Using admin endpoint for metrics is insecure #36729