Network level Rate Limit Service works, but not HTTP level

I have been experimenting with the Rate Limit Service in this repository:

Rate limiting using a network level filter is working well with this configuration:

static_resources:
  listeners:
    - name: listener_80
      address:
        socket_address: { address: 0.0.0.0, port_value: 80 }
      filter_chains:
        - filters:
            - name: envoy.ratelimit
              config:
                stat_prefix: ingress_ratelimit
                domain: envoy
                failure_mode_deny: true
                descriptors:
                  - entries:
                      - key: client_id
                        value: foo

            - name: envoy.http_connection_manager
              config:
                codec_type: auto
                stat_prefix: ingress_http
                access_log:
                  name: envoy.file_access_log
                  config: { path: /dev/stdout }
                route_config:
                  name: local_route
                  virtual_hosts:
                    - name: backend
                      domains: ["*"]
                      routes:
                        - match: { prefix: "/" }
                          route: { cluster: hello }
                http_filters:
                  - name: envoy.router
                    config: {}

  clusters:
    - name: ratelimit
      connect_timeout: 0.25s
      type: strict_dns
      lb_policy: round_robin
      http2_protocol_options: {}
      hosts:
        - socket_address: { address: ratelimit, port_value: 8081 }

    - name: hello
      connect_timeout: 0.25s
      type: strict_dns
      lb_policy: round_robin
      hosts:
        - socket_address: { address: hello, port_value: 8080 }

rate_limit_service:
  grpc_service:
    envoy_grpc: { cluster_name: ratelimit }

admin:
  access_log_path: "/dev/null"
  address:
    socket_address: { address: 0.0.0.0, port_value: 8001 }

https://github.com/rwlincoln/rlstest/blob/e8d6e2dc01f5db0bb06d1c11aa5e69a9a3d0d349/net-rls.yaml

The activity of the rate limiter can be seen in debug mode when calling curl localhost:8080 repeatedly:

$ docker-compose up --force-recreate
Recreating rlstest_front_1 ... 
Recreating rlstest_front_1
Recreating rlstest_redis_1 ... 
Recreating rlstest_front_1 ... done
Recreating rlstest_hello_1 ... 
Recreating rlstest_redis_1 ... done
Recreating rlstest_ratelimit_1 ... 
Recreating rlstest_ratelimit_1 ... done
Attaching to rlstest_front_1, rlstest_redis_1, rlstest_hello_1, rlstest_ratelimit_1
front_1      | [2019-03-29 15:27:32.439][000006][info][main] [source/server/server.cc:206] initializing epoch 0 (hot restart version=10.200.16384.127.options=capacity=16384, num_slots=8209 hash=228984379728933363 size=2654312)
front_1      | [2019-03-29 15:27:32.439][000006][info][main] [source/server/server.cc:208] statically linked extensions:
front_1      | [2019-03-29 15:27:32.439][000006][info][main] [source/server/server.cc:210]   access_loggers: envoy.file_access_log,envoy.http_grpc_access_log
front_1      | [2019-03-29 15:27:32.439][000006][info][main] [source/server/server.cc:213]   filters.http: envoy.buffer,envoy.cors,envoy.ext_authz,envoy.fault,envoy.filters.http.header_to_metadata,envoy.filters.http.jwt_authn,envoy.filters.http.rbac,envoy.grpc_http1_bridge,envoy.grpc_json_transcoder,envoy.grpc_web,envoy.gzip,envoy.health_check,envoy.http_dynamo_filter,envoy.ip_tagging,envoy.lua,envoy.rate_limit,envoy.router,envoy.squash
front_1      | [2019-03-29 15:27:32.439][000006][info][main] [source/server/server.cc:216]   filters.listener: envoy.listener.original_dst,envoy.listener.proxy_protocol,envoy.listener.tls_inspector
front_1      | [2019-03-29 15:27:32.439][000006][info][main] [source/server/server.cc:219]   filters.network: envoy.client_ssl_auth,envoy.echo,envoy.ext_authz,envoy.filters.network.dubbo_proxy,envoy.filters.network.rbac,envoy.filters.network.sni_cluster,envoy.filters.network.thrift_proxy,envoy.http_connection_manager,envoy.mongo_proxy,envoy.ratelimit,envoy.redis_proxy,envoy.tcp_proxy
front_1      | [2019-03-29 15:27:32.439][000006][info][main] [source/server/server.cc:221]   stat_sinks: envoy.dog_statsd,envoy.metrics_service,envoy.stat_sinks.hystrix,envoy.statsd
front_1      | [2019-03-29 15:27:32.439][000006][info][main] [source/server/server.cc:223]   tracers: envoy.dynamic.ot,envoy.lightstep,envoy.tracers.datadog,envoy.zipkin
front_1      | [2019-03-29 15:27:32.439][000006][info][main] [source/server/server.cc:226]   transport_sockets.downstream: envoy.transport_sockets.alts,envoy.transport_sockets.capture,raw_buffer,tls
front_1      | [2019-03-29 15:27:32.439][000006][info][main] [source/server/server.cc:229]   transport_sockets.upstream: envoy.transport_sockets.alts,envoy.transport_sockets.capture,raw_buffer,tls
front_1      | [2019-03-29 15:27:32.445][000006][info][main] [source/server/server.cc:271] admin address: 0.0.0.0:8001
front_1      | [2019-03-29 15:27:32.452][000006][info][config] [source/server/configuration_impl.cc:50] loading 0 static secret(s)
front_1      | [2019-03-29 15:27:32.452][000006][info][config] [source/server/configuration_impl.cc:56] loading 2 cluster(s)
front_1      | [2019-03-29 15:27:32.454][000006][info][config] [source/server/configuration_impl.cc:67] loading 1 listener(s)
front_1      | [2019-03-29 15:27:32.463][000006][info][config] [source/server/configuration_impl.cc:92] loading tracing configuration
front_1      | [2019-03-29 15:27:32.467][000006][info][config] [source/server/configuration_impl.cc:112] loading stats sink configuration
front_1      | [2019-03-29 15:27:32.472][000006][info][main] [source/server/server.cc:463] starting main dispatch loop
front_1      | [2019-03-29 15:27:32.631][000006][info][upstream] [source/common/upstream/cluster_manager_impl.cc:136] cm init: all clusters initialized
front_1      | [2019-03-29 15:27:32.632][000006][info][main] [source/server/server.cc:435] all clusters initialized. initializing init manager
front_1      | [2019-03-29 15:27:32.632][000006][info][config] [source/server/listener_manager_impl.cc:961] all dependencies initialized. starting workers
redis_1      | 1:C 29 Mar 2019 15:27:33.631 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
redis_1      | 1:C 29 Mar 2019 15:27:33.631 # Redis version=5.0.4, bits=64, commit=00000000, modified=0, pid=1, just started
redis_1      | 1:C 29 Mar 2019 15:27:33.631 # Warning: no config file specified, using the default config. In order to specify a config file use redis-server /path/to/redis.conf
redis_1      | 1:M 29 Mar 2019 15:27:33.634 * Running mode=standalone, port=6379.
redis_1      | 1:M 29 Mar 2019 15:27:33.634 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
redis_1      | 1:M 29 Mar 2019 15:27:33.634 # Server initialized
redis_1      | 1:M 29 Mar 2019 15:27:33.634 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
redis_1      | 1:M 29 Mar 2019 15:27:33.634 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
redis_1      | 1:M 29 Mar 2019 15:27:33.634 * DB loaded from disk: 0.000 seconds
redis_1      | 1:M 29 Mar 2019 15:27:33.634 * Ready to accept connections
hello_1      | 2019/03/29 15:27:34 Server is listening on :8080
ratelimit_1  | time="2019-03-29T15:27:35Z" level=warning msg="statsd is not in use"
ratelimit_1  | time="2019-03-29T15:27:35Z" level=debug msg="runtime changed. loading new snapshot at /data/ratelimit"
ratelimit_1  | time="2019-03-29T15:27:35Z" level=debug msg="runtime: processing /data/ratelimit"
ratelimit_1  | time="2019-03-29T15:27:35Z" level=debug msg="runtime: processing /data/ratelimit/config.yml"
ratelimit_1  | time="2019-03-29T15:27:35Z" level=debug msg="runtime: adding key=config.yml value=---\ndomain: envoy\ndescriptors:\n  - key: remote_address\n    rate_limit:\n      unit: minute\n      requests_per_unit: 3\n\n  - key: client_id\n    value: foo\n    rate_limit:\n      unit: minute\n      requests_per_unit: 3\n\n  - key: destination_cluster\n    value: hello\n    rate_limit:\n      unit: minute\n      requests_per_unit: 3\n\n  - key: generic_key\n    value: bar\n    rate_limit:\n      unit: minute\n      requests_per_unit: 3\n uint=false"
ratelimit_1  | time="2019-03-29T15:27:35Z" level=warning msg="connecting to redis on tcp redis:6379 with pool size 10"
ratelimit_1  | time="2019-03-29T15:27:35Z" level=debug msg="loading domain: envoy"
ratelimit_1  | time="2019-03-29T15:27:35Z" level=debug msg="loading descriptor: key=envoy.remote_address ratelimit={requests_per_unit=3, unit=MINUTE}"
ratelimit_1  | time="2019-03-29T15:27:35Z" level=debug msg="loading descriptor: key=envoy.client_id_foo ratelimit={requests_per_unit=3, unit=MINUTE}"
ratelimit_1  | time="2019-03-29T15:27:35Z" level=debug msg="loading descriptor: key=envoy.destination_cluster_hello ratelimit={requests_per_unit=3, unit=MINUTE}"
ratelimit_1  | time="2019-03-29T15:27:35Z" level=debug msg="loading descriptor: key=envoy.generic_key_bar ratelimit={requests_per_unit=3, unit=MINUTE}"
ratelimit_1  | time="2019-03-29T15:27:35Z" level=debug msg="waiting for runtime update"
ratelimit_1  | time="2019-03-29T15:27:35Z" level=warning msg="Listening for HTTP on ':8080'"
ratelimit_1  | time="2019-03-29T15:27:35Z" level=warning msg="Listening for gRPC on ':8081'"
ratelimit_1  | time="2019-03-29T15:27:35Z" level=warning msg="Listening for debug on ':6070'"
ratelimit_1  | time="2019-03-29T15:27:37Z" level=debug msg="starting get limit lookup"
ratelimit_1  | time="2019-03-29T15:27:37Z" level=debug msg="looking up key: client_id_foo"
ratelimit_1  | time="2019-03-29T15:27:37Z" level=debug msg="found rate limit: client_id_foo"
ratelimit_1  | time="2019-03-29T15:27:37Z" level=debug msg="starting cache lookup"
ratelimit_1  | time="2019-03-29T15:27:37Z" level=debug msg="looking up cache key: envoy_client_id_foo_1553873220"
ratelimit_1  | time="2019-03-29T15:27:37Z" level=debug msg="cache key: envoy_client_id_foo_1553873220 current: 1"
ratelimit_1  | time="2019-03-29T15:27:37Z" level=debug msg="returning normal response"
hello_1      | 2019/03/29 15:27:37 localhost:8080 172.28.0.2:37748 "GET / HTTP/1.1" 200 12 "curl/7.58.0" 41.067µs
front_1      | [2019-03-29T15:27:37.684Z] "GET / HTTP/1.1" 200 - 0 12 1 0 "-" "curl/7.58.0" "d96ae420-3381-48f1-9fa8-811ef4d9a689" "localhost:8080" "172.28.0.4:8080"
ratelimit_1  | time="2019-03-29T15:27:39Z" level=debug msg="starting get limit lookup"
ratelimit_1  | time="2019-03-29T15:27:39Z" level=debug msg="looking up key: client_id_foo"
ratelimit_1  | time="2019-03-29T15:27:39Z" level=debug msg="found rate limit: client_id_foo"
ratelimit_1  | time="2019-03-29T15:27:39Z" level=debug msg="starting cache lookup"
ratelimit_1  | time="2019-03-29T15:27:39Z" level=debug msg="looking up cache key: envoy_client_id_foo_1553873220"
ratelimit_1  | time="2019-03-29T15:27:39Z" level=debug msg="cache key: envoy_client_id_foo_1553873220 current: 2"
ratelimit_1  | time="2019-03-29T15:27:39Z" level=debug msg="returning normal response"
hello_1      | 2019/03/29 15:27:39 localhost:8080 172.28.0.2:37748 "GET / HTTP/1.1" 200 12 "curl/7.58.0" 14.946µs
ratelimit_1  | time="2019-03-29T15:27:40Z" level=debug msg="starting get limit lookup"
ratelimit_1  | time="2019-03-29T15:27:40Z" level=debug msg="looking up key: client_id_foo"
ratelimit_1  | time="2019-03-29T15:27:40Z" level=debug msg="found rate limit: client_id_foo"
ratelimit_1  | time="2019-03-29T15:27:40Z" level=debug msg="starting cache lookup"
ratelimit_1  | time="2019-03-29T15:27:40Z" level=debug msg="looking up cache key: envoy_client_id_foo_1553873220"
ratelimit_1  | time="2019-03-29T15:27:40Z" level=debug msg="cache key: envoy_client_id_foo_1553873220 current: 3"
ratelimit_1  | time="2019-03-29T15:27:40Z" level=debug msg="returning normal response"
hello_1      | 2019/03/29 15:27:40 localhost:8080 172.28.0.2:37760 "GET / HTTP/1.1" 200 12 "curl/7.58.0" 13.759µs
ratelimit_1  | time="2019-03-29T15:27:40Z" level=debug msg="starting get limit lookup"
ratelimit_1  | time="2019-03-29T15:27:40Z" level=debug msg="looking up key: client_id_foo"
ratelimit_1  | time="2019-03-29T15:27:40Z" level=debug msg="found rate limit: client_id_foo"
ratelimit_1  | time="2019-03-29T15:27:40Z" level=debug msg="starting cache lookup"
ratelimit_1  | time="2019-03-29T15:27:40Z" level=debug msg="looking up cache key: envoy_client_id_foo_1553873220"
ratelimit_1  | time="2019-03-29T15:27:40Z" level=debug msg="cache key: envoy_client_id_foo_1553873220 current: 4"
ratelimit_1  | time="2019-03-29T15:27:40Z" level=debug msg="returning normal response"
^CGracefully stopping... (press Ctrl+C again to force)
Stopping rlstest_ratelimit_1 ... done
Stopping rlstest_hello_1     ... done
Stopping rlstest_redis_1     ... done
Stopping rlstest_front_1     ... done

However, when I try to use an HTTP level rate limit filter, there is no activity from the rate limit service. The HTTP level configuration I've been using is:

static_resources:
  listeners:
    - name: listener_80
      address:
        socket_address: { address: 0.0.0.0, port_value: 80 }
      filter_chains:
        - filters:
            - name: envoy.http_connection_manager
              config:
                codec_type: auto
                stat_prefix: ingress_http
                route_config:
                  name: local_route
                  virtual_hosts:
                    - name: backend
                      domains: ["*"]
                      rate_limits:
                        - stage: 0
                          actions:
                            - remote_address: {}
                      routes:
                        - match: { prefix: "/" }
                          route:
                            cluster: hello
                            include_vh_rate_limits: true
                            rate_limits:
                              - actions:
                                  - destination_cluster: {}
                                  - generic_key: { descriptor_value: bar }
                http_filters:
                  - name: envoy.router
                    config: {}
                  - name: envoy.rate_limit
                    config:
                      domain: envoy
                      failure_mode_deny: true

  clusters:
    - name: ratelimit
      connect_timeout: 0.25s
      type: strict_dns
      lb_policy: round_robin
      http2_protocol_options: {}
      hosts:
        - socket_address: { address: ratelimit, port_value: 8081 }

    - name: hello
      connect_timeout: 0.25s
      type: strict_dns
      lb_policy: round_robin
      hosts:
        - socket_address: { address: hello, port_value: 8080 }

rate_limit_service:
  grpc_service:
    envoy_grpc: { cluster_name: ratelimit }

admin:
  access_log_path: "/dev/null"
  address:
    socket_address: { address: 0.0.0.0, port_value: 8001 }

https://github.com/rwlincoln/rlstest/blob/e8d6e2dc01f5db0bb06d1c11aa5e69a9a3d0d349/http-rls.yaml

It looks very similar to config files found elsewhere, such as in #3388. Can anyone see what I am missing? Perhaps it is something to do with the transition from ratelimit.proto to rls.proto, but I have tried it with versions 1.7 and 1.8 and the behavior is the same.

envoyproxy / envoy

Network level Rate Limit Service works, but not HTTP level #6433