envoyproxy / envoy

Cloud-native high-performance edge/middle/service proxy
https://www.envoyproxy.io
Apache License 2.0
24.48k stars 4.72k forks source link

Admin endpoint security #2763

Open htuch opened 6 years ago

htuch commented 6 years ago

The admin endpoint today is unsecured (no authentication or TLS), with the assumption that it is only available to localhost or accessible on a trusted network. Ideally:

Beyond just security, there's also the question of what the admin console is. Is it just a curlable utility, an interactive web console or is it a first-class API intended for programatic use? Should it offer gRPC endpoints (in particular as we are moving towards a proto definition of its contents in places such as https://github.com/envoyproxy/envoy/issues/2172). Answers to this affect the framing of security considerations.

Opening this issue to start the design discussion here.

mattklein123 commented 6 years ago

A few initial points are in the comment here: https://github.com/envoyproxy/data-plane-api/pull/523#issuecomment-371550679

htuch commented 6 years ago

One thing I wonder about is whether we should be offering a traditional web interface at all. Here's one alternative design; implement only gRPC or REST endpoints, have folks build out Javascript client side interfaces which can be served from a listener via Envoy's direct response (https://www.envoyproxy.io/docs/envoy/latest/api-v2/api/v2/route/route.proto#envoy-api-field-route-route-direct-response) mechanism. This removes Envoy from being in the business of worrying about XSS/CSRF/other web security concerns, while providing the convenience of browser admin capability.

mattklein123 commented 6 years ago

For reference the web stuff was only recently added by @jmarantz. I objected to this slightly at the time because realistically I think the only production use for the admin endpoint is either curl or other automated tools, but it didn't seem like a big deal to me to serve the landing page in HTML so I didn't worry about it that much.

My overall view of things right now is that the admin endpoint is not secure it all and must be accessed only over a trusted network. In the future I think we should do the following things:

  1. Generally provide only gRPC/REST endpoints codified in the data-plane-api repo (already tracked by various issues).
  2. Promote the admin endpoint to a fuller listener config allowing for both TLS/mTLS and inline RBAC security on a per-endpoint basis (once we have the inline RBAC filter). This will allow operators to configure things as they like it.
  3. Per @htuch we should consider having a "secure by default" admin config which locks admin down to localhost only and then requires operators to open up individual endpoints (.e.g., /stats) as they see fit.

In general I think that worrying about things like XSS/CSRF/etc. is kind of a waste of time for this. I will defer to @jmarantz who knows substantially more about this on what we should do on that front security-wise (hopefully optimizing for realistic usage scenarios).

mattklein123 commented 6 years ago

P.S., it would be great if someone in the community who is passionate about admin security might want to own this. There will be a non-trivial amount of work here to get to where we want to ultimately be.

jmarantz commented 6 years ago

One clarification: the http handlers for mutating operations were pre-existing. The change added a web home-page with proper escaping, and reduced XSS through proper http content types, and AFAIK added no additional exposure.

I agree that more restricted access by default would help.

mattklein123 commented 6 years ago

One clarification: the http handlers for mutating operations were pre-existing. The change added a web home-page with proper escaping, and reduced XSS through proper http content types, and AFAIK added no additional exposure.

Sorry I spoke incorrectly. Before your change we had no HTML. I know basically nothing about web security. My real point was that if the existence of HTML is going to cause security consternation, I don't think it's worth maintaining because IMHO the use of the HTML endpoints is not going to happen in production. Assuming there is no additional exposure, than it's fine by me. I just wanted to point out that we should be careful about the HTML stuff if that is going to cause us additional maintenance burden.

ofek commented 6 years ago

Additionally, I would strongly recommend Envoy have a separate listener/endpoint that only serves /stats with optional basic auth.

ofek commented 6 years ago

@DataDog is recommending https://gist.github.com/ofek/6051508cd0dfa98fc6c13153b647c6f8 until this is solved.

Idea courtesy of @ggreenway Config courtesy of @bndw (with this modification from @htuch)

taiki45 commented 6 years ago

I vote to disabling web admin interface. Alternatively:

In additon, these features might be disabled by default. All of programatic use goes gRPC/REST API to supoprt user extendability, and rest of admin interface gets on FS permissions.

Personally, I like Envoy's curl-able interface so I prefer Unix domain socket commands which is similar to the current web admin interface. I have a little passion to move admin operations from web interface to Unix domain socket one.

ofek commented 6 years ago

What's the difference between a "pull-based endpoint" like /stats and a REST API version?

jmarantz commented 6 years ago

I think it makes sense to control access via configuration, including disabling the http admin interface completely or restricting it to an IP, with separate controls for read-access vs POSTed mutations.

taiki45 commented 6 years ago

What's the difference between a "pull-based endpoint" like /stats and a REST API version?

The main difference that I thought is the API one will have a dedicated listener and will be properly schema controlled. For example, /stats already has a json format for programmatic access.

taiki45 commented 6 years ago

It's not a strong objection but, to say source IP base restriction or local loopback binding, we want more detailed permission control in some deployment cases: developers can login a host in which Envoy runs but do not want to allow them to take admin operations of the Envoy instance, but want to allow only administrators to do that operations for easy debbuging.

krm1312 commented 4 years ago

Based on what products like spring boot and haproxy do with their admin/management interfaces (which allows us to keep them on in production but locked down):

  1. By default, disable all management endpoints that allow state mutation.
  2. Selectively allow disabling/enabling specific endpoints.
  3. Add support for basic auth
  4. Add support for TLS

For example, I'd like to scrape prometheus statistics, but, without a bunch of hoops (statsd gateway or more interfaces/firewall rules) that is not possible without exposing the ability to shutdown the server.

hogarthj commented 4 years ago

I know this has been a bit of a long running issue but I just wanted to share a config I tested that was built on the work others have done previously, with a focus on only presenting to the network admin pages that I considered safe (and this would be fairly trivial to extend with a client TLS auth filter for instance)...

The context I'm working with here is arbitrary containerised application that serves TCP data and we want TLS encryption being presented as far as the system with the container ... so "envoy as a sidecar" on a generic docker app, but we want to have health/troubleshooting information exposed.

Reading through the docs it occurred to me that the lb_endpoint.endpoint address was of type core.Address which supports pipe as a type and that the admin stanza also referred to the core.Address type ...

By exposing the admin "interface" only over a locally controlled socket (which in our case isn't visible at all from external to the container running envoy), a data path can be constructed from listener to cluster to that socket with filters applied to the listener to control the preferred access levels.

I've tested this against both envoy-alpine and envoy-alpine-dev using openssl s_client and nc -kl 8080 to verify that the encryption is working as expected and the routing is valid. In this example the "fake admin page" still proxies the root / but anything that is outside of the approved targets wont' get routed so will only get a 404.

static_resources:
  listeners:
  - address:
      socket_address:
        address: 0.0.0.0
        port_value: 443
    listener_filters:
    - name: envoy.listener.tls_inspector
      config: {}
    filter_chains:
    - filter_chain_match:
        server_names: ["test.example.com"]
        transport_protocol: tls
    - filters:
      - name: envoy.tcp_proxy
        config:
          stat_prefix: test_server
          cluster: test_server
          access_log:
          - name: envoy.file_access_log
            config:
              path: "/dev/stdout"
      tls_context:
        common_tls_context:
          tls_certificates:
            certificate_chain: {filename: "/etc/ssl/certs/certificate.cert"}
            private_key: {filename: "/etc/ssl/private/private.key"}
  - address:
      socket_address:
        address: 0.0.0.0
        port_value: 9000
    filter_chains:
    - filters:
      - name: envoy.http_connection_manager
        config:
          stat_prefix: stats_server
          route_config:
            virtual_hosts:
            - name: admin_interface
              domains:
              - "*"
              routes:
              - match:
                  safe_regex:
                    google_re2: {}
                    regex: '/(certs|stats(/prometheus)?|server_info|clusters|listeners|ready)?'
                  headers:
                    - name: ':method'
                      exact_match: GET
                route:
                  cluster: service_stats
          http_filters:
          - name: envoy.router
            config: {}
  clusters:
  - name: test_server
    connect_timeout: 0.25s
    type: static
    load_assignment:
      cluster_name: test_server
      endpoints:
        lb_endpoints:
          endpoint:
            address:
              socket_address:
                address: 127.0.0.1
                port_value: 8080
  - name: service_stats
    connect_timeout: 0.250s
    type: static
    load_assignment:
      cluster_name: service_stats
      endpoints:
        lb_endpoints:
          endpoint:
            address:
              pipe:
                path: /var/run/envoy.admin
admin:
  access_log_path: "/dev/stdout"
  address:
    pipe:
      path: /var/run/envoy.admin
jmarantz commented 4 years ago

It should be straightforward to add an option -- maybe a command-line option -- to reject all POST requests to admin. All mutations already check for POST... oh look I said that in 2018 too...this would be an easy and useful beginner project IMO. I would help review.

mattklein123 commented 4 years ago

@jmarantz FWIW I would rather just make the admin listener a real listener that can use filters. I don't think this would actually be that hard. Then we use all normal config for RBAC, access logging, etc.

justincely commented 4 years ago

IMO letting filters be set on a true admin listener would solve this well, but that defaulting to no endpoints enabled and selectively turning them on would be preferable from a security standpoint. There's just less room for user error if only what's needed is enabled. For certain setups I don't want anyone, even on the box, to hit things like /quitquitquit, or /certs, so seems safer to just have them disabled.

Is anyone already planning to work this? I'd be interested in supplying some work from my team to get some of these proposals in if it's not already claimed.

htuch commented 4 years ago

@justincely I think if we had the admin port as a listener, it would be reasonable to add a simple HTTP filter to block this; it's probably entirely doable with something like the RBAC filter today (although probably more complicated than you'd want from a UX perspective).

I don't think anyone is working on this right now, so go ahead. I would recommend moving to admin listener as the first step here.

mattklein123 commented 4 years ago

+1 let's start by just making the admin listener a real listener.

cstrahan commented 4 years ago

I think if we had the admin port as a listener, it would be reasonable to add a simple HTTP filter to block this

@justincely and I were talking about this earlier today. I'll be working on this shortly -- just now getting a sense of what all needs to be done to get us there. As I work on this, feel free to assign this issue to me whenever you feel confident in my ability to deliver.

ofek commented 4 years ago

Hello all! For those of us providing monitoring solutions based on /stats, could someone please briefly explain what would need to be changed config-wise to retain access to that endpoint once the proposed feature lands?

mattklein123 commented 4 years ago

Hello all! For those of us providing monitoring solutions based on /stats, could someone please briefly explain what would need to be changed config-wise to retain access to that endpoint once the proposed feature lands?

I think the default behavior is likely to be the same as it is today (fully open), but we will allow for real listener configuration including the RBAC filter, etc. so that certain endpoints can be blocked. It's possible that eventually we would change the default posture but I'm not sure this would happen in the initial version.

ofek commented 4 years ago

Excellent, thanks!

cstrahan commented 4 years ago

A proposal to secure the admin endpoint

I was chatting with @mattklein123 a couple weeks ago about securing the admin endpoint, under the assumption that we'd want some way to allow users to specify arbitrary filters (e.g. RBAC).

I would like to propose that we allow specifying a Listener config in the Admin message, and deprecate the Admin fields that can be taken directly from the Listener (e.g. address details).

The AdminFilter would be made a first class filter (registered with just like the other http filters), but we would validate that the AdminFilter is only used within the Admin config, and that the filter is specified last in the filter chain.

I would appreciate feedback from both Envoy users and developers; would this approach work for you?

/cc @justincely

mattklein123 commented 4 years ago

I would appreciate feedback from both Envoy users and developers; would this approach work for you?

Yes I think this SGTM. Thank you for working on this! cc @envoyproxy/security-team

htuch commented 3 years ago

For posterity, I'd like to note some outcome of a recent discussion around why admin endpoint is sometimes opened more widely than it should. Some users are making use of the Prometheus stats endpoint (https://www.envoyproxy.io/docs/envoy/latest/operations/admin.html?highlight=prometheus#get--stats-prometheus) for exposing out Envoy stats. Unfortunately, due to the lack of fine-grained access control (or any for that matter), we end up exposing the entire endpoint out to the network.

We would probably have some reasonable wins for security by having a separate stats and admin endpoint, but ultimately, making this a first class listener would provide RBAC and per-route granularity.

geo-y commented 3 years ago

I can use one light-weight web server backend to access the admin interface back with basic authentication. Here is my example:

docker-compose.yml(key part):

services:
  h2o:
    image: fukata/h2o-php:latest
    volumes:
      - <path>/h2o/:/etc/h2o/ext/
      - <path>/html/:/var/www/
    command: ["h2o", "-m", "master", "-c", "/etc/h2o/ext/h2o.conf"]
    restart: on-failure
  envoy:
    image: envoyproxy/envoy-alpine-dev:latest
    volumes:
      - <path>/envoy/:/etc/envoy/ext/:ro
      - <path>/cert/:/etc/crts/:ro
    ports:
      - "80:8000"
      - "443:8443"
    depends_on:
      - h2o
    command: /usr/local/bin/envoy -c /etc/envoy/ext/front_v3.yaml
    restart: on-failure

envoy(https://github.com/envoyproxy/envoy/blob/main/examples/front-proxy/front-envoy.yaml):

###
            virtual_hosts:
            - name: envoy_admin1
              domains:
              - "<admin-host1>"
              - "<admin-host2>"
              routes:
              - match:
                  prefix: "/"
                route:
                  cluster: h2o1
###
  clusters:
  - name: h2o1
    connect_timeout: 2s
    type: strict_dns
    lb_policy: round_robin
    load_assignment:
      cluster_name: h2o1
      endpoints:
      - lb_endpoints:
        - endpoint:
            address:
              socket_address:
                address: h2o
                port_value: 80
###

h2o.conf:

hosts:
  "<admin-host>":
    listen:
      port: 80
    paths:
      "/":
        mruby.handler: |
          require "htpasswd.rb"
          Htpasswd.new("/etc/h2o/ext/htpass", "realm-name")
        proxy.reverse.url: http://envoy:<admin-port>
        proxy.preserve-host: ON

The "htpasswd" file manages the admin user and password:

htpasswd ./htpass admin-username
tpetkov-VMW commented 2 years ago

In order to have secure settings by default, we have a local patch that adds a list of explicitly allowed endpoints directly into the bootstrap configuration. There are a few things to consider around the design, but the implementation is pretty straight forward and would give some guarantees that nobody can, for example, do /quitquitquit . Would this be helpful/desired?

maxres-ch commented 1 year ago

I was wondering if there's any movement on this issue? We'd really like to see it as a configurable thing.

jmarantz commented 5 months ago

https://github.com/envoyproxy/envoy/pull/11367 is stale and needs to be re-started with a dev ready to push it forward.

See also https://github.com/envoyproxy/envoy/pull/32346 which just merged, and is somewhat related.