Open htuch opened 6 years ago
A few initial points are in the comment here: https://github.com/envoyproxy/data-plane-api/pull/523#issuecomment-371550679
One thing I wonder about is whether we should be offering a traditional web interface at all. Here's one alternative design; implement only gRPC or REST endpoints, have folks build out Javascript client side interfaces which can be served from a listener via Envoy's direct response (https://www.envoyproxy.io/docs/envoy/latest/api-v2/api/v2/route/route.proto#envoy-api-field-route-route-direct-response) mechanism. This removes Envoy from being in the business of worrying about XSS/CSRF/other web security concerns, while providing the convenience of browser admin capability.
For reference the web stuff was only recently added by @jmarantz. I objected to this slightly at the time because realistically I think the only production use for the admin endpoint is either curl or other automated tools, but it didn't seem like a big deal to me to serve the landing page in HTML so I didn't worry about it that much.
My overall view of things right now is that the admin endpoint is not secure it all and must be accessed only over a trusted network. In the future I think we should do the following things:
/stats
) as they see fit.In general I think that worrying about things like XSS/CSRF/etc. is kind of a waste of time for this. I will defer to @jmarantz who knows substantially more about this on what we should do on that front security-wise (hopefully optimizing for realistic usage scenarios).
P.S., it would be great if someone in the community who is passionate about admin security might want to own this. There will be a non-trivial amount of work here to get to where we want to ultimately be.
One clarification: the http handlers for mutating operations were pre-existing. The change added a web home-page with proper escaping, and reduced XSS through proper http content types, and AFAIK added no additional exposure.
I agree that more restricted access by default would help.
One clarification: the http handlers for mutating operations were pre-existing. The change added a web home-page with proper escaping, and reduced XSS through proper http content types, and AFAIK added no additional exposure.
Sorry I spoke incorrectly. Before your change we had no HTML. I know basically nothing about web security. My real point was that if the existence of HTML is going to cause security consternation, I don't think it's worth maintaining because IMHO the use of the HTML endpoints is not going to happen in production. Assuming there is no additional exposure, than it's fine by me. I just wanted to point out that we should be careful about the HTML stuff if that is going to cause us additional maintenance burden.
Additionally, I would strongly recommend Envoy have a separate listener/endpoint that only serves /stats
with optional basic auth.
@DataDog is recommending https://gist.github.com/ofek/6051508cd0dfa98fc6c13153b647c6f8 until this is solved.
Idea courtesy of @ggreenway Config courtesy of @bndw (with this modification from @htuch)
I vote to disabling web admin interface. Alternatively:
/cpuprofiler
or /logging
to runtime configuration flags or simple Unix domain socket commands like haproxy one. This allows us to manage permissions via traditional file system permissions./stats
to gRPC/REST API and let them have fuller listener config.In additon, these features might be disabled by default. All of programatic use goes gRPC/REST API to supoprt user extendability, and rest of admin interface gets on FS permissions.
Personally, I like Envoy's curl-able interface so I prefer Unix domain socket commands which is similar to the current web admin interface. I have a little passion to move admin operations from web interface to Unix domain socket one.
What's the difference between a "pull-based endpoint" like /stats
and a REST API version?
I think it makes sense to control access via configuration, including disabling the http admin interface completely or restricting it to an IP, with separate controls for read-access vs POSTed mutations.
What's the difference between a "pull-based endpoint" like /stats and a REST API version?
The main difference that I thought is the API one will have a dedicated listener and will be properly schema controlled. For example, /stats
already has a json format for programmatic access.
It's not a strong objection but, to say source IP base restriction or local loopback binding, we want more detailed permission control in some deployment cases: developers can login a host in which Envoy runs but do not want to allow them to take admin operations of the Envoy instance, but want to allow only administrators to do that operations for easy debbuging.
Based on what products like spring boot and haproxy do with their admin/management interfaces (which allows us to keep them on in production but locked down):
For example, I'd like to scrape prometheus statistics, but, without a bunch of hoops (statsd gateway or more interfaces/firewall rules) that is not possible without exposing the ability to shutdown the server.
I know this has been a bit of a long running issue but I just wanted to share a config I tested that was built on the work others have done previously, with a focus on only presenting to the network admin pages that I considered safe (and this would be fairly trivial to extend with a client TLS auth filter for instance)...
The context I'm working with here is arbitrary containerised application that serves TCP data and we want TLS encryption being presented as far as the system with the container ... so "envoy as a sidecar" on a generic docker app, but we want to have health/troubleshooting information exposed.
Reading through the docs it occurred to me that the lb_endpoint.endpoint address was of type core.Address which supports pipe
as a type and that the admin stanza also referred to the core.Address type ...
By exposing the admin "interface" only over a locally controlled socket (which in our case isn't visible at all from external to the container running envoy), a data path can be constructed from listener to cluster to that socket with filters applied to the listener to control the preferred access levels.
I've tested this against both envoy-alpine and envoy-alpine-dev using openssl s_client
and nc -kl 8080
to verify that the encryption is working as expected and the routing is valid. In this example the "fake admin page" still proxies the root /
but anything that is outside of the approved targets wont' get routed so will only get a 404.
static_resources:
listeners:
- address:
socket_address:
address: 0.0.0.0
port_value: 443
listener_filters:
- name: envoy.listener.tls_inspector
config: {}
filter_chains:
- filter_chain_match:
server_names: ["test.example.com"]
transport_protocol: tls
- filters:
- name: envoy.tcp_proxy
config:
stat_prefix: test_server
cluster: test_server
access_log:
- name: envoy.file_access_log
config:
path: "/dev/stdout"
tls_context:
common_tls_context:
tls_certificates:
certificate_chain: {filename: "/etc/ssl/certs/certificate.cert"}
private_key: {filename: "/etc/ssl/private/private.key"}
- address:
socket_address:
address: 0.0.0.0
port_value: 9000
filter_chains:
- filters:
- name: envoy.http_connection_manager
config:
stat_prefix: stats_server
route_config:
virtual_hosts:
- name: admin_interface
domains:
- "*"
routes:
- match:
safe_regex:
google_re2: {}
regex: '/(certs|stats(/prometheus)?|server_info|clusters|listeners|ready)?'
headers:
- name: ':method'
exact_match: GET
route:
cluster: service_stats
http_filters:
- name: envoy.router
config: {}
clusters:
- name: test_server
connect_timeout: 0.25s
type: static
load_assignment:
cluster_name: test_server
endpoints:
lb_endpoints:
endpoint:
address:
socket_address:
address: 127.0.0.1
port_value: 8080
- name: service_stats
connect_timeout: 0.250s
type: static
load_assignment:
cluster_name: service_stats
endpoints:
lb_endpoints:
endpoint:
address:
pipe:
path: /var/run/envoy.admin
admin:
access_log_path: "/dev/stdout"
address:
pipe:
path: /var/run/envoy.admin
It should be straightforward to add an option -- maybe a command-line option -- to reject all POST requests to admin. All mutations already check for POST... oh look I said that in 2018 too...this would be an easy and useful beginner project IMO. I would help review.
@jmarantz FWIW I would rather just make the admin listener a real listener that can use filters. I don't think this would actually be that hard. Then we use all normal config for RBAC, access logging, etc.
IMO letting filters be set on a true admin listener would solve this well, but that defaulting to no endpoints enabled and selectively turning them on would be preferable from a security standpoint. There's just less room for user error if only what's needed is enabled. For certain setups I don't want anyone, even on the box, to hit things like /quitquitquit
, or /certs
, so seems safer to just have them disabled.
Is anyone already planning to work this? I'd be interested in supplying some work from my team to get some of these proposals in if it's not already claimed.
@justincely I think if we had the admin port as a listener, it would be reasonable to add a simple HTTP filter to block this; it's probably entirely doable with something like the RBAC filter today (although probably more complicated than you'd want from a UX perspective).
I don't think anyone is working on this right now, so go ahead. I would recommend moving to admin listener as the first step here.
+1 let's start by just making the admin listener a real listener.
I think if we had the admin port as a listener, it would be reasonable to add a simple HTTP filter to block this
@justincely and I were talking about this earlier today. I'll be working on this shortly -- just now getting a sense of what all needs to be done to get us there. As I work on this, feel free to assign this issue to me whenever you feel confident in my ability to deliver.
Hello all! For those of us providing monitoring solutions based on /stats
, could someone please briefly explain what would need to be changed config-wise to retain access to that endpoint once the proposed feature lands?
Hello all! For those of us providing monitoring solutions based on /stats, could someone please briefly explain what would need to be changed config-wise to retain access to that endpoint once the proposed feature lands?
I think the default behavior is likely to be the same as it is today (fully open), but we will allow for real listener configuration including the RBAC filter, etc. so that certain endpoints can be blocked. It's possible that eventually we would change the default posture but I'm not sure this would happen in the initial version.
Excellent, thanks!
I was chatting with @mattklein123 a couple weeks ago about securing the admin endpoint, under the assumption that we'd want some way to allow users to specify arbitrary filters (e.g. RBAC).
I would like to propose that we allow specifying a Listener
config in the Admin
message, and deprecate the Admin
fields that can be taken directly from the Listener
(e.g. address details).
The AdminFilter
would be made a first class filter (registered with just like the other http filters), but we would validate that the AdminFilter
is only used within the Admin
config, and that the filter is specified last in the filter chain.
I would appreciate feedback from both Envoy users and developers; would this approach work for you?
/cc @justincely
I would appreciate feedback from both Envoy users and developers; would this approach work for you?
Yes I think this SGTM. Thank you for working on this! cc @envoyproxy/security-team
For posterity, I'd like to note some outcome of a recent discussion around why admin endpoint is sometimes opened more widely than it should. Some users are making use of the Prometheus stats endpoint (https://www.envoyproxy.io/docs/envoy/latest/operations/admin.html?highlight=prometheus#get--stats-prometheus) for exposing out Envoy stats. Unfortunately, due to the lack of fine-grained access control (or any for that matter), we end up exposing the entire endpoint out to the network.
We would probably have some reasonable wins for security by having a separate stats and admin endpoint, but ultimately, making this a first class listener would provide RBAC and per-route granularity.
I can use one light-weight web server backend to access the admin interface back with basic authentication. Here is my example:
docker-compose.yml(key part):
services:
h2o:
image: fukata/h2o-php:latest
volumes:
- <path>/h2o/:/etc/h2o/ext/
- <path>/html/:/var/www/
command: ["h2o", "-m", "master", "-c", "/etc/h2o/ext/h2o.conf"]
restart: on-failure
envoy:
image: envoyproxy/envoy-alpine-dev:latest
volumes:
- <path>/envoy/:/etc/envoy/ext/:ro
- <path>/cert/:/etc/crts/:ro
ports:
- "80:8000"
- "443:8443"
depends_on:
- h2o
command: /usr/local/bin/envoy -c /etc/envoy/ext/front_v3.yaml
restart: on-failure
envoy(https://github.com/envoyproxy/examples/blob/main/front-proxy/envoy.yaml):
###
virtual_hosts:
- name: envoy_admin1
domains:
- "<admin-host1>"
- "<admin-host2>"
routes:
- match:
prefix: "/"
route:
cluster: h2o1
###
clusters:
- name: h2o1
connect_timeout: 2s
type: strict_dns
lb_policy: round_robin
load_assignment:
cluster_name: h2o1
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: h2o
port_value: 80
###
h2o.conf:
hosts:
"<admin-host>":
listen:
port: 80
paths:
"/":
mruby.handler: |
require "htpasswd.rb"
Htpasswd.new("/etc/h2o/ext/htpass", "realm-name")
proxy.reverse.url: http://envoy:<admin-port>
proxy.preserve-host: ON
The "htpasswd" file manages the admin user and password:
htpasswd ./htpass admin-username
In order to have secure settings by default, we have a local patch that adds a list of explicitly allowed endpoints directly into the bootstrap configuration. There are a few things to consider around the design, but the implementation is pretty straight forward and would give some guarantees that nobody can, for example, do /quitquitquit . Would this be helpful/desired?
I was wondering if there's any movement on this issue? We'd really like to see it as a configurable thing.
https://github.com/envoyproxy/envoy/pull/11367 is stale and needs to be re-started with a dev ready to push it forward.
See also https://github.com/envoyproxy/envoy/pull/32346 which just merged, and is somewhat related.
Hi, what happens with this issue?
The admin endpoint today is unsecured (no authentication or TLS), with the assumption that it is only available to localhost or accessible on a trusted network. Ideally:
/quitquitquit
vs. stats monitoring.Beyond just security, there's also the question of what the admin console is. Is it just a
curl
able utility, an interactive web console or is it a first-class API intended for programatic use? Should it offer gRPC endpoints (in particular as we are moving towards a proto definition of its contents in places such as https://github.com/envoyproxy/envoy/issues/2172). Answers to this affect the framing of security considerations.Opening this issue to start the design discussion here.