envoyproxy / java-control-plane

Java implementation of an Envoy gRPC control plane
Apache License 2.0
293 stars 136 forks source link

server: introducing DDoS prevention mechanisms #102

Closed chemicL closed 5 years ago

chemicL commented 5 years ago

As described in #101 – I'm suggesting adding a DDoS prevention layer and this is an attempt to address this issue.

The work done here is partly based on the example implementation of manual flow control in grpc-java.

I tried making everything customizable and pluggable. The existing usages of java-control-plane should not be affected by these changes and will be available upon using classes in thelimits package as depicted in TestLimits runner.

To perform load tests I used ghz. Here is the configuration and a run command:

{
  "proto": "proto/cds.proto",
  "importPaths": "proto",
  "call": "envoy.api.v2.ClusterDiscoveryService/StreamClusters",
  "n": 45,
  "c": 15,
  "t": 60,
  "host": "0.0.0.0:12345",
  "insecure": true,
  "d": {
    "node": {
      "id": "myself",
      "cluster": "somecluster",
      "metadata": {},
      "locality": {
        "region": "unknown",
        "zone": "one",
        "sub_zone": ""
      }
    },
    "resource_names": [],
    "response_nonce": "",
    "type_url": "type.googleapis.com/envoy.api.v2.Cluster"
  }
}

ghz -config java-control-plane-loadtest.json

To run this, I copied over the proto folder into my testing directory.

What this configuration does:

With the TestLimits configuration of

This test takes around 30s to finish and results in 15 streams not being handled.

ghz's output:

Summary:
  Count:    45
  Total:    28.01 s
  Slowest:  10.00 s
  Fastest:  6.27 ms
  Average:  5.23 s
  Requests/sec: 1.61

Response time histogram:
  6.272 [1] |∎∎
  1006.015 [1]  |∎∎
  2005.759 [1]  |∎∎
  3005.502 [1]  |∎∎
  4005.245 [1]  |∎∎
  5004.988 [1]  |∎∎
  6004.731 [1]  |∎∎
  7004.474 [1]  |∎∎
  8004.218 [1]  |∎∎
  9003.961 [2]  |∎∎∎∎
  10003.704 [19]    |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎

Latency distribution:
  10% in 2.01 s
  25% in 7.01 s
  50% in 10.00 s
  75% in 10.00 s
  90% in 10.00 s
  95% in 10.00 s
  0% in 0 ns
Status code distribution:
  [OK]            30 responses
  [Unavailable]   15 responses

Error distribution:
  [15]   rpc error: code = Unavailable desc =

Regarding code coverage. I tried doing my best, but due to the nature of ServerCallStreamObserver and the hidden default implementation, mocking it would miss the point in my opinion. When it comes to the part of limiting concurrently open streams, I tried depicting it in TestLimits runner and providing a load test configuration. I'm open to suggestions if you have better ideas for automated tests that I can add.

Fixes #101

codecov-io commented 5 years ago

Codecov Report

Merging #102 into master will decrease coverage by 11.37%. The diff coverage is 18.44%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master    #102       +/-   ##
============================================
- Coverage     92.88%   81.5%   -11.38%     
- Complexity      150     157        +7     
============================================
  Files            19      25        +6     
  Lines           576     676      +100     
  Branches         48      49        +1     
============================================
+ Hits            535     551       +16     
- Misses           32     116       +84     
  Partials          9       9
Impacted Files Coverage Δ Complexity Δ
...roxy/controlplane/server/limits/StreamLimiter.java 0% <0%> (ø) 0 <0> (?)
...ontrolplane/server/limits/GuavaRequestLimiter.java 0% <0%> (ø) 0 <0> (?)
.../controlplane/server/limits/ManualFlowControl.java 0% <0%> (ø) 0 <0> (?)
...yproxy/controlplane/server/limits/FlowControl.java 100% <100%> (ø) 1 <1> (?)
...xy/controlplane/server/limits/NoOpFlowControl.java 100% <100%> (ø) 4 <4> (?)
...nvoyproxy/controlplane/server/DiscoveryServer.java 81.36% <34.21%> (-14.67%) 15 <1> (+1)
...controlplane/server/limits/NoOpRequestLimiter.java 50% <50%> (ø) 1 <1> (?)
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 8e19277...d3ee1fb. Read the comment docs.

chemicL commented 5 years ago

@snowp thanks for having a look and for your remarks. I addressed your requests and updated the PR. Please do let me know what's your view on the server interceptor and if anything else needs changes.

chemicL commented 5 years ago

I just rebased with master to resolve conflicts in imports.

chemicL commented 5 years ago

I'm closing this PR, more details here: https://github.com/envoyproxy/java-control-plane/issues/101#issuecomment-527449648