DataBiosphere / azul

Metadata indexer and query service used for AnVIL, HCA, LungMAP, and CGP
Apache License 2.0
7 stars 2 forks source link

Non-abusive request rates trigger WAF rate blocked alarms #6321

Closed dsotirho-ucsc closed 2 months ago

dsotirho-ucsc commented 4 months ago

Note that in this example, there was one 10 min period on 2024-06-02 where the user's request rate exceeded 1000/5min. This is a valid situation for the waf_rate_blocked alarm to fire.

Later on 2024-06-04 however, there are multiple occurrences where the user's request rate peaked at or just above 1000/5min and then dropped back down below the threshold. These are the cases which should be considered non-abusive and not fire the alarm.

CloudWatch logs (prod)):

Screenshot 2024-06-06 at 4 40 58 PM

CloudWatch logs (prod)):

Screenshot 2024-06-06 at 4 42 18 PM

All requests (blocked & not blocked) from 202.120.234.179
bin(5min)                  num
---------------------------------
2024-06-01 20:40:00.000    57
2024-06-01 20:55:00.000    56
2024-06-01 21:40:00.000    106
2024-06-01 22:20:00.000    1
2024-06-01 22:40:00.000    154
2024-06-01 23:40:00.000    138
2024-06-01 23:55:00.000    1
2024-06-02 00:40:00.000    135
2024-06-02 00:55:00.000    100
2024-06-02 01:55:00.000    103
2024-06-02 02:05:00.000    100
2024-06-02 02:25:00.000    101
2024-06-02 02:40:00.000    2357 **
2024-06-02 02:45:00.000    1857 **
2024-06-02 02:55:00.000    957
2024-06-02 03:10:00.000    100
2024-06-02 03:20:00.000    17
2024-06-02 03:25:00.000    1
2024-06-02 03:30:00.000    175
2024-06-02 03:35:00.000    125
2024-06-02 03:40:00.000    93
2024-06-02 03:45:00.000    1
2024-06-02 04:40:00.000    181
2024-06-02 05:05:00.000    1
2024-06-02 05:40:00.000    197
2024-06-02 06:40:00.000    129
2024-06-02 07:40:00.000    172
2024-06-02 08:45:00.000    172
2024-06-02 09:45:00.000    175
2024-06-02 10:10:00.000    66
2024-06-02 10:15:00.000    58
2024-06-02 11:15:00.000    101
2024-06-02 11:30:00.000    100
2024-06-02 11:40:00.000    27
2024-06-02 11:50:00.000    101
2024-06-02 12:50:00.000    101
2024-06-02 13:35:00.000    143
2024-06-02 13:40:00.000    5
2024-06-02 13:45:00.000    2
2024-06-02 13:50:00.000    2
2024-06-02 14:35:00.000    104
2024-06-02 14:40:00.000    1
2024-06-02 14:45:00.000    5
2024-06-02 14:50:00.000    1
2024-06-02 15:00:00.000    1
2024-06-02 15:20:00.000    100
2024-06-02 15:50:00.000    115
2024-06-02 16:05:00.000    100
2024-06-02 16:25:00.000    60
2024-06-02 16:30:00.000    100
2024-06-02 16:40:00.000    100
2024-06-02 16:45:00.000    128
2024-06-02 16:55:00.000    67
2024-06-02 17:00:00.000    143
2024-06-02 17:05:00.000    200
2024-06-02 17:10:00.000    121
2024-06-02 17:15:00.000    100
2024-06-02 17:20:00.000    100
2024-06-02 17:25:00.000    120
2024-06-02 17:35:00.000    100
2024-06-02 18:35:00.000    110
2024-06-02 19:00:00.000    1
2024-06-02 19:35:00.000    729
2024-06-02 19:40:00.000    676
2024-06-02 19:45:00.000    84
2024-06-02 19:50:00.000    157
2024-06-02 19:55:00.000    100
2024-06-02 20:55:00.000    36
2024-06-02 21:00:00.000    138
2024-06-02 21:10:00.000    53
2024-06-02 21:20:00.000    101
2024-06-02 21:25:00.000    376
2024-06-02 21:30:00.000    527
2024-06-02 22:30:00.000    10
2024-06-02 22:35:00.000    188
2024-06-02 23:35:00.000    103
2024-06-03 00:35:00.000    105
2024-06-03 00:45:00.000    100
2024-06-03 01:45:00.000    100
2024-06-03 02:45:00.000    8
2024-06-03 02:50:00.000    2
2024-06-03 03:15:00.000    100
2024-06-03 03:30:00.000    100
2024-06-03 04:30:00.000    103
2024-06-03 04:35:00.000    2
2024-06-03 04:55:00.000    7
2024-06-03 05:00:00.000    5
2024-06-03 05:05:00.000    1
2024-06-03 05:15:00.000    12
2024-06-03 05:25:00.000    4
2024-06-03 05:30:00.000    122
2024-06-03 05:35:00.000    64
2024-06-03 06:35:00.000    103
2024-06-03 07:35:00.000    199
2024-06-03 08:35:00.000    134
2024-06-03 08:50:00.000    100
2024-06-03 09:10:00.000    1
2024-06-03 09:15:00.000    100
2024-06-03 10:15:00.000    102
2024-06-03 10:40:00.000    101
2024-06-03 11:00:00.000    1
2024-06-03 11:05:00.000    100
2024-06-03 11:30:00.000    100
2024-06-03 11:45:00.000    81
2024-06-03 11:50:00.000    1
2024-06-03 12:05:00.000    100
2024-06-03 12:25:00.000    100
2024-06-03 13:25:00.000    196
2024-06-03 14:25:00.000    162
2024-06-03 14:55:00.000    100
2024-06-03 15:15:00.000    100
2024-06-03 15:25:00.000    100
2024-06-03 15:55:00.000    44
2024-06-03 16:00:00.000    52
2024-06-03 16:10:00.000    101
2024-06-03 16:15:00.000    254
2024-06-03 16:35:00.000    248
2024-06-03 16:40:00.000    38
2024-06-03 16:55:00.000    101
2024-06-03 17:00:00.000    107
2024-06-03 17:40:00.000    25
2024-06-03 17:45:00.000    8
2024-06-03 18:00:00.000    1
2024-06-03 18:05:00.000    100
2024-06-03 18:20:00.000    101
2024-06-03 18:25:00.000    100
2024-06-03 18:30:00.000    100
2024-06-03 18:45:00.000    101
2024-06-03 18:50:00.000    100
2024-06-03 19:50:00.000    201
2024-06-03 20:15:00.000    100
2024-06-03 21:15:00.000    101
2024-06-03 22:15:00.000    101
2024-06-03 22:20:00.000    343
2024-06-03 22:25:00.000    100
2024-06-03 22:40:00.000    53
2024-06-03 22:45:00.000    100
2024-06-03 23:45:00.000    101
2024-06-03 23:50:00.000    100
2024-06-04 00:05:00.000    20
2024-06-04 00:30:00.000    100
2024-06-04 00:40:00.000    50
2024-06-04 00:50:00.000    102
2024-06-04 00:55:00.000    75
2024-06-04 01:00:00.000    100
2024-06-04 01:05:00.000    100
2024-06-04 01:10:00.000    25
2024-06-04 01:20:00.000    100
2024-06-04 01:25:00.000    100
2024-06-04 01:30:00.000    100
2024-06-04 01:35:00.000    100
2024-06-04 01:40:00.000    100
2024-06-04 01:45:00.000    34
2024-06-04 02:40:00.000    102
2024-06-04 02:45:00.000    22
2024-06-04 03:40:00.000    1
2024-06-04 03:45:00.000    100
2024-06-04 03:50:00.000    100
2024-06-04 04:50:00.000    102
2024-06-04 04:55:00.000    100
2024-06-04 05:00:00.000    100
2024-06-04 05:10:00.000    100
2024-06-04 05:25:00.000    1
2024-06-04 05:30:00.000    155
2024-06-04 05:35:00.000    34
2024-06-04 05:50:00.000    101
2024-06-04 06:45:00.000    100
2024-06-04 07:45:00.000    1
2024-06-04 07:50:00.000    100
2024-06-04 08:50:00.000    201
2024-06-04 08:55:00.000    5
2024-06-04 09:30:00.000    7
2024-06-04 09:55:00.000    101
2024-06-04 10:10:00.000    100
2024-06-04 11:10:00.000    152
2024-06-04 11:20:00.000    9
2024-06-04 12:10:00.000    103
2024-06-04 13:10:00.000    63
2024-06-04 13:35:00.000    101
2024-06-04 13:50:00.000    1
2024-06-04 14:35:00.000    101
2024-06-04 14:40:00.000    100
2024-06-04 15:40:00.000    68
2024-06-04 16:05:00.000    152
2024-06-04 16:15:00.000    100
2024-06-04 16:20:00.000    41
2024-06-04 16:30:00.000    1
2024-06-04 16:55:00.000    100
2024-06-04 17:55:00.000    104
2024-06-04 18:05:00.000    165
2024-06-04 18:25:00.000    130
2024-06-04 18:30:00.000    142
2024-06-04 18:35:00.000    176
2024-06-04 18:40:00.000    201
2024-06-04 19:00:00.000    304
2024-06-04 19:05:00.000    1000 **
2024-06-04 19:10:00.000    900
2024-06-04 19:15:00.000    900
2024-06-04 19:20:00.000    1000 **
2024-06-04 19:25:00.000    900
2024-06-04 19:30:00.000    900
2024-06-04 19:35:00.000    760
2024-06-04 19:40:00.000    282
2024-06-04 19:45:00.000    252
2024-06-04 19:50:00.000    254
2024-06-04 19:55:00.000    190
2024-06-04 20:00:00.000    323
2024-06-04 20:05:00.000    421
2024-06-04 20:10:00.000    82
2024-06-04 20:15:00.000    335
2024-06-04 20:20:00.000    156
2024-06-04 20:25:00.000    287
2024-06-04 20:30:00.000    823
2024-06-04 20:35:00.000    800
2024-06-04 20:40:00.000    935
2024-06-04 20:45:00.000    1000 **
2024-06-04 20:50:00.000    729
2024-06-04 20:55:00.000    471
2024-06-04 21:00:00.000    778
2024-06-04 21:05:00.000    779
2024-06-04 21:10:00.000    853
2024-06-04 21:15:00.000    890
2024-06-04 21:20:00.000    825
2024-06-04 21:25:00.000    803
2024-06-04 21:30:00.000    882
2024-06-04 21:35:00.000    533
2024-06-04 21:40:00.000    902
2024-06-04 21:45:00.000    598
2024-06-04 21:50:00.000    872
2024-06-04 21:55:00.000    1000 **
2024-06-04 22:00:00.000    900
2024-06-04 22:05:00.000    900
2024-06-04 22:10:00.000    900
2024-06-04 22:15:00.000    1000 **
2024-06-04 22:20:00.000    902
2024-06-04 22:25:00.000    980
2024-06-04 22:30:00.000    895
2024-06-04 22:35:00.000    899
2024-06-04 22:40:00.000    814
2024-06-04 22:45:00.000    777
2024-06-04 22:50:00.000    900
2024-06-04 22:55:00.000    900
2024-06-04 23:00:00.000    1000 **
2024-06-04 23:05:00.000    900
2024-06-04 23:10:00.000    1000 **
2024-06-04 23:15:00.000    1000 **
2024-06-04 23:20:00.000    900
2024-06-04 23:25:00.000    615
2024-06-04 23:30:00.000    613
2024-06-04 23:35:00.000    1022 **
2024-06-04 23:40:00.000    875
2024-06-04 23:45:00.000    935
2024-06-04 23:50:00.000    815
2024-06-04 23:55:00.000    846
2024-06-05 00:00:00.000    522
2024-06-05 00:05:00.000    827
2024-06-05 00:10:00.000    576
2024-06-05 00:15:00.000    593
2024-06-05 00:20:00.000    991
2024-06-05 00:25:00.000    1000 **
2024-06-05 00:30:00.000    900
2024-06-05 00:35:00.000    750
2024-06-05 00:40:00.000    900
2024-06-05 00:45:00.000    900
2024-06-05 00:50:00.000    800
2024-06-05 00:55:00.000    1000 **
2024-06-05 01:00:00.000    988
2024-06-05 01:05:00.000    1032 **
2024-06-05 01:10:00.000    900
2024-06-05 01:15:00.000    740
2024-06-05 01:20:00.000    963
2024-06-05 01:25:00.000    832
2024-06-05 01:30:00.000    416
2024-06-05 01:35:00.000    371
2024-06-05 01:40:00.000    778
2024-06-05 01:45:00.000    867
2024-06-05 01:50:00.000    833
2024-06-05 01:55:00.000    1000 **
2024-06-05 02:00:00.000    997
2024-06-05 02:05:00.000    903
2024-06-05 02:10:00.000    1000 **
2024-06-05 02:15:00.000    900
2024-06-05 02:20:00.000    1000 **
2024-06-05 02:25:00.000    600
2024-06-05 02:30:00.000    757
2024-06-05 02:35:00.000    561
2024-06-05 02:40:00.000    100
2024-06-05 02:45:00.000    100
2024-06-05 02:50:00.000    1
2024-06-05 02:55:00.000    2
2024-06-05 03:05:00.000    100
2024-06-05 03:45:00.000    900
2024-06-05 03:55:00.000    87
2024-06-05 04:00:00.000    608
2024-06-05 05:00:00.000    1
2024-06-05 05:05:00.000    104
2024-06-05 06:05:00.000    15
2024-06-05 06:15:00.000    2
2024-06-05 06:20:00.000    5
2024-06-05 06:25:00.000    11
2024-06-05 06:30:00.000    6
2024-06-05 06:40:00.000    4
2024-06-05 06:45:00.000    5
2024-06-05 06:50:00.000    3
2024-06-05 06:55:00.000    4
2024-06-05 07:05:00.000    5
2024-06-05 07:10:00.000    6
2024-06-05 07:20:00.000    3
2024-06-05 07:25:00.000    3
2024-06-05 07:40:00.000    1
2024-06-05 07:45:00.000    2
2024-06-05 08:05:00.000    1
2024-06-05 08:10:00.000    2
2024-06-05 08:25:00.000    3
2024-06-05 08:30:00.000    1
2024-06-05 09:00:00.000    3
2024-06-05 09:05:00.000    2
2024-06-05 10:10:00.000    1
2024-06-05 11:10:00.000    1
2024-06-05 12:10:00.000    1
2024-06-05 13:10:00.000    2
2024-06-05 13:20:00.000    103
2024-06-05 14:20:00.000    22
2024-06-05 14:30:00.000    20
2024-06-05 14:35:00.000    26
2024-06-05 14:40:00.000    14
2024-06-05 14:50:00.000    4
2024-06-05 15:20:00.000    9
2024-06-05 15:25:00.000    3
2024-06-05 15:30:00.000    3
2024-06-05 15:55:00.000    3
2024-06-05 16:10:00.000    3
2024-06-05 16:15:00.000    3
dsotirho-ucsc commented 4 months ago

@hannes-ucsc: "Assignee to investigate if a metric math query or some other aggregation mechanism can be used to only trigger the alarm if the number of blocked requests exceeds twice the request rate limit for any given client IP. This would only work if the client IP is tracked as a dimension in the datapoints published by WAF."

dsotirho-ucsc commented 3 months ago

@hannes-ucsc: "Based on @dsotirho-ucsc's research, it appears highly unlikely that WAF includes the specific IP address of a client as a dimension in any of the CloudWatch metrics that it publishes. https://docs.aws.amazon.com/waf/latest/developerguide/waf-metrics.html Assignee to explore the possibility of having two different rate limiting rules: one for returning a 429 response with Retry-After at a lower limit, and another one to trigger the alarm, at a higher limit."

dsotirho-ucsc commented 3 months ago

Tested with two separate WAF rate based rules set at the same time, one with a limit of 1000/5min (per IP) to block requests, and one with a limit of 2000/5min (per IP) that blocks and also trips an alarm. In order for this to work the rate rule with the larger limit has a lower priority, meaning it is evaluated first.

The lower limit rate rule successfully blocked requests when the request rate exceeded the threshold, and the higher limit rate rule successfully tripped the alarm when its threshold was exceeded.

Screenshot 2024-06-13 at 9 54 46 AM

Screenshot 2024-06-13 at 10 10 04 AM

Index: src/azul/__init__.py
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/src/azul/__init__.py b/src/azul/__init__.py
--- a/src/azul/__init__.py  (revision 6a6ea97331c43b0f249455b7d315804fa16d1c11)
+++ b/src/azul/__init__.py  (date 1718214034766)
@@ -1566,8 +1566,14 @@

     allowed_v4_ips_term = 'allowed_v4_ips'

+    # We use two rate rules, one that blocks requests when a lower threshold is
+    # exceeded, and one that triggers an alarm when a higher threshold is
+    # exceeded.
+
     waf_rate_rule_name = 'RateRule'

+    waf_rate_alarm_rule_name = 'RateAlarmRule'
+
     waf_rate_rule_period = 300  # seconds; this value is fixed by AWS

     waf_rate_rule_retry_after = 30  # seconds
Index: terraform/cloudwatch.tf.json.template.py
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/terraform/cloudwatch.tf.json.template.py b/terraform/cloudwatch.tf.json.template.py
--- a/terraform/cloudwatch.tf.json.template.py  (revision 6a6ea97331c43b0f249455b7d315804fa16d1c11)
+++ b/terraform/cloudwatch.tf.json.template.py  (date 1718152007910)
@@ -316,7 +316,7 @@
                             'dimensions': {
                                 'WebACL': '${aws_wafv2_web_acl.api_gateway.name}',
                                 'Region': config.region,
-                                'Rule': config.waf_rate_rule_name
+                                'Rule': config.waf_rate_alarm_rule_name
                             },
                             'alarm_actions': ['${data.aws_sns_topic.monitoring.arn}'],
                             'ok_actions': ['${data.aws_sns_topic.monitoring.arn}'],
Index: terraform/api_gateway.tf.json.template.py
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/terraform/api_gateway.tf.json.template.py b/terraform/api_gateway.tf.json.template.py
--- a/terraform/api_gateway.tf.json.template.py (revision 6a6ea97331c43b0f249455b7d315804fa16d1c11)
+++ b/terraform/api_gateway.tf.json.template.py (date 1718226011468)
@@ -242,33 +242,44 @@
                                     ('AllowedIPs', 'allow', config.allowed_v4_ips_term)
                                 ]
                             ],
-                            {
-                                'name': config.waf_rate_rule_name,
-                                'action': {
-                                    'block': {
-                                        'custom_response': {
-                                            'response_code': 429,
-                                            'response_header': [
-                                                {
-                                                    'name': 'Retry-After',
-                                                    'value': str(config.waf_rate_rule_retry_after)
-                                                }
-                                            ]
-                                        }
-                                    }
-                                },
-                                'statement': {
-                                    'rate_based_statement': {
-                                        'limit': config.waf_rate_rule_limit,
-                                        'aggregate_key_type': 'IP'
-                                    }
-                                },
-                                'visibility_config': {
-                                    'metric_name': config.waf_rate_rule_name,
-                                    'sampled_requests_enabled': True,
-                                    'cloudwatch_metrics_enabled': True
+                            *[
+                                {
+                                    'name': name,
+                                    'action': {
+                                        'block': {
+                                            'custom_response': {
+                                                'response_code': 429,
+                                                'response_header': [
+                                                    {
+                                                        'name': 'Retry-After',
+                                                        'value': str(config.waf_rate_rule_retry_after)
+                                                    }
+                                                ]
+                                            }
+                                        }
+                                    },
+                                    'statement': {
+                                        'rate_based_statement': {
+                                            'limit': limit,
+                                            'aggregate_key_type': 'IP'
+                                        }
+                                    },
+                                    'visibility_config': {
+                                        'metric_name': name,
+                                        'sampled_requests_enabled': True,
+                                        'cloudwatch_metrics_enabled': True
+                                    }
                                 }
-                            },
+                                for name, limit in [
+                                    # The rate rule with the larger limit needs
+                                    # to be defined first, otherwise the rule
+                                    # with the smaller limit would trigger first
+                                    # and prevent evaluation of any following
+                                    # rules.
+                                    (config.waf_rate_alarm_rule_name, config.waf_rate_rule_limit * 2),
+                                    (config.waf_rate_rule_name, config.waf_rate_rule_limit),
+                                ]
+                            ],
                             {
                                 'name': 'AWS-CommonRuleSet',
                                 'override_action': {
achave11-ucsc commented 3 months ago

Assignee to prepare PR.

hannes-ucsc commented 2 months ago

For demo, run the flooder script with three different rates, below, at and above the limit. Show that an alarm is only triggered for the latter rate.

dsotirho-ucsc commented 2 months ago

We had to modify the script to successfully complete the demo. I'll be committing these changes at a later time in one of my PRs.