Tarpit: HTTP DDOS mitigation through envoy

ramachaitanyak commented 5 years ago

Description:

This is a mitigation approach where the socket/connection is not closed during attack, instead is kept open draining attacker resources. This technique does not forward requests upstream and essentially ignores all the requests from the attacker, shielding the clusters behind envoy. This implementation leverages the TCP zero-window state and exponential back-offs during TCP re transmissions.

I have tested the implementation on envoy with about 200,000 concurrent connections and 20k-40k requests per second and analyzed the CPU utilization, memory usage and network bandwidth utilization on both attacker and server side.

This implementation on envoy utilized tuning the socket receive buffers and receive buffer low watermarks. It is different than the one implemented in the kernel, hooking into the TCP stack.

I have shared some RFCs on how TCP reacts to zero-window state.Please let me know your thoughts if this is something interesting to upstream?

Thanks! Rama

[optional Relevant Links:]

Original implementation for iptables add-on : https://sourceforge.net/p/xtables-addons/xtables-addons/ci/master/tree/extensions/xt_TARPIT.c https://tools.ietf.org/html/rfc6429 https://tools.ietf.org/html/rfc1122#section-4.2.2.17

mattklein123 commented 5 years ago

This is something that I would like to see in Envoy, but optimally most of this would be implemented as an independent set of filters. Can you work on a full design proposal?

ramachaitanyak commented 5 years ago

Design and implementation is based on the closure for this issue: https://github.com/envoyproxy/envoy/issues/8291

I will update the design soon after.

Thanks! Rama

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or other activity occurs. Thank you for your contributions.

ramachaitanyak commented 5 years ago

Hello, apologize for the delay. I am still working on issue #8291 and will follow up on this topic after. The idea broadly here is to tune the socket receive buffer size and socket receive low watermark so that; the receive buffer is never read by the application. This gives the impression to the attacker that the server is still receiving requests, as the connection is not closed and the underlying default TCP stack on the client side would re-try with exponential back-off.

There are 2 modes of implementation, that I made on this topic locally:

Tarpit in slow decay mode
Tarpit in instantaneous decay mode.

The slow decay mode, is similar to the producer-consumer problem; where the rate is tweaked so that there is an eventual deadlock in the system. I noticed this drives the attacker TCP stack to the zero-window state; which again has exponential back off; there by reducing the bandwidth, cpu and memory utilization on the server.

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or other activity occurs. Thank you for your contributions.

ramachaitanyak commented 5 years ago

Hello, the initial idea is to have a separate L7 security filter, which is a re-written combination of router filter and RBAC filter in envoy. Tarpit would be an action that would be executed by the security filter when there is a policy match.

Security filter has two components, matcher and action. The administrator would apply a policy that has certain match conditions and configures the action. Here the match conditions are based on a combination of nested AND & OR rules of HTTP headers; IP matchers etc. (I can elaborate and share few snippets for match conditions, if we want to go ahead with this approach.)

Say a legitimate policy describing an use case based on this design would be;

If the source-ip address is ip1 and the path accessed for any request is /foo/bar from ip1; then tarpit the connection.

Here the match conditions are source-ip and path; while the action is tarpit. The rules can be very complex by nesting AND & OR rules.

Based on the designs for https://github.com/envoyproxy/envoy/pull/8851/, https://github.com/envoyproxy/envoy/pull/9099 and summary of the security filter above;

There would be an Action interface available to L7 security-filter; here is some abstraction I was thinking...

class TarpitAction : public Action

The implementation would use the PerConnectionObjectSharedPtr and through the MutableHttpConnection instance tune the socket RECV_BUF and RECV_LO_WAT to drive the TCP connection state to a zero-window or retransmission with exponential back-off.

In case, the security-filter abstraction is not preferred; we could have a NetworkActions interface in include/envoy/network/connection.h that can directly modify the TCP state using ConnectionImpl::setSocketRecvBufferSize and ConnectionImpl::setSocketRecvLoWat API referred here: https://github.com/envoyproxy/envoy/pull/9099.

With this approach, the abstractions on how to invoke tarpit could be separated to another PR

Please let me know your thoughts....

Thanks! Rama

mattklein123 commented 4 years ago

@ramachaitanyak at a high level this sounds fine, however we have to avoid adding yet another matching system. Can you please take a look at the tap filter matching system and figure out how we can reuse that here?

ramachaitanyak commented 4 years ago

@mattklein123 I had some discussions internally. So I am of the opinion to break this issue down to two parts.

The first part is to have a separate issue to scope out the design for an envoy security filter and another one to scope out the actions supported by this filter.

Here is a brief config on how the security filter would behave:

"filters": [{
           "name": "envoy.security",
           "config": {
               "policies": []
           }
           }]

  policies:  
{
    "name" : string,
     "match": {...},  
     "action": {...},
}

An example use case will be:

match:
   - and_rules:
         rules:
          - header: { name: ":method", exact_match: "GET" }
          - header: { name: ":path", regex_match: "/products(/.*)?" }
          - source_ip: { "ip_list": [ 23.3.3.3/32 ] }
          - or_rules:
               rules:
                - header: { name: ":authority", exact_match: "www.x.co.uk" }
                - header: { name: ":authority", exact_match: "www.y.de" }

and where supported actions could be:

"action":{
  "drop_request": {
       "close_connection": "..."
   }
  "reset_connection": "...",
  "static_response_page": "{...}",
  "tarpit_connection": "...",
  "cluster_redirect": "...",
  "rate_limit": "{...}"
  "challenge_page": "{...}"
  .... more occult stuff ...
}

Effectively we have the following config for the security filter:

//   policies:
//     - name: reset-get-request
//       match:
//         and_rules:
//          - header: { name: ":method", exact_match: "GET" }
//          - source_ip: { 1.1.1.1 }
//       action:
//         - reset_connection: "true"
//     - name: tarpit-post-request
//       match:
//         and_rules:
//           rules:
//             - header: { name: ":method", exact_match: "POST" }
//             - header: { name: ":path", regex_match: "/products(/.*)?" }
//             - or_rules:
//               rules:
//                - header: { name: ":authority", exact_match: "www.awesomewebsite1.co.uk" }
//                - header: { name: ":authority", exact_match: "www.awesomewebsite2.de" }
//       action:
//         tarpit_connection: true

This is more or less the use-case what we have in mind. I did a cursory study on the TAP filter as you recommended, however did not look exhaustively enough to understand if it supported IP matching; that said, I am open to any implementation that matches this use case. However want to separate the work for it, into another topic.

With this in mind, I think adhering to the topic on which this issue is opened; I would like this thread to be utilized in the design and implementation for L7 based tarpit operation.

Since, I mentioned about where I intend to use this tarpit operation concretely:

Utilizing the PRs currently worked on : https://github.com/envoyproxy/envoy/pull/8851 https://github.com/envoyproxy/envoy/pull/9099

I intend to have an 'Actions' interface. Now, if you suggest that we do the implementation of Tarpit only after sketching out the design for the security filter concretely; please let me know and I am fine with it.

However if you think we could have a stand-alone use for Tarpit; please let me know where do you think it is fitting. Should I write an interface for Actions in the Network namespace ?

Based on the feedback, we can identify and break this problem into coherent actionable tasks.

If you agree on this idea, please advice what relevant issues could be opened to track them.

Thanks in advance for your time. Rama

mattklein123 commented 4 years ago

Hi @ramachaitanyak. We have discussed this internally at Lyft, and we need this filter also in Q1. Along those lines, I'm going to nominate @gkleiman to work with you directly on the design for this. Can you please find a time to meet with each other to have an initial chat? That way you can produce a design doc together. I can chat with @gkleiman in the background to help out as well.

At a high level I agree on what we are trying to accomplish here, but we absolutely must not reimplement matching again, so we will need to figure out how to lift the tap matching system into common code, or potentially even into the HCM to allow matching to be done generically for any filter that wants it. I will discuss this part with @gkleiman offline.

Thank you for working on this! I'm very excited to see this happen.

cc @alyssawilk also as I know you all have been talking to her about this offline as well.

A-And commented 4 years ago

@ramachaitanyak @gkleiman - is this still planned for 1.16.0? We're super interested in this.

ramachaitanyak commented 4 years ago

@A-And ; @gkleiman and I synced up before the pandemic had begun and I had shared a design. Unfortunately either of us did not work together since then.

I have completed an implementation for my company locally and had been moved away from this project. I did not get time to complete this; however if there is interest and @mattklein123 is ready opening some of the unfinished PRs, I could resume work slowly on it; while I am engaging my current commitments.

We will be needing 2 PRs: https://github.com/envoyproxy/envoy/pull/8851 and a network socket options PR; which I am unable to find in the repo.

ramachaitanyak commented 4 years ago

The other PR is https://github.com/envoyproxy/envoy/pull/9099 which I will need to open, update and merge

mattklein123 commented 4 years ago

I believe the current plan is that @gkleiman is going to work on this. He can follow up with @ramachaitanyak on a timeline for figuring out how to converge on something we can get upstream.

envoyproxy / envoy

Tarpit: HTTP DDOS mitigation through envoy #8292