DataBiosphere / azul

Metadata indexer and query service used for AnVIL, HCA, LungMAP, and CGP
Apache License 2.0
7 stars 2 forks source link

Different rate limits depending on HTTP method #5555

Open achave11-ucsc opened 1 year ago

achave11-ucsc commented 1 year ago
achave11-ucsc commented 1 year ago

Spike for design.

hannes-ucsc commented 1 year ago

5533 will switch /manifest/files and /fetch/manifest/files to POST instead of GET.

Add a WAF ACL rule that applies the azul:expensive label to POST requests, and that counts them. Split the existing rate limiting rule into two rules, one for requests labeled azul:expensive with a limit of 10 per 5 minutes, and another one for all other requests with the existing limit of 1000 per 5min.

achave11-ucsc commented 10 months ago

This issue made an appearance by a user making 122 request to the manifest endpoint, overwhelming the ES instance and causing many service execution timeouts.

[
    {
        "identity_sourceIp": "152.51.48.1",
        "service_endpoint": "/fetch/manifest/files",
        "e_msg": "-",
        "gw_logs": "",
        "count": "2"
    },
    {
        "identity_sourceIp": "152.51.48.1",
        "service_endpoint": "/fetch/manifest/files",
        "e_msg": "Endpoint request timed out",
        "gw_logs": "",
        "count": "122"
    }
]
whois 152.51.48.1

% IANA WHOIS server
% for more information on IANA, visit http://www.iana.org/
% This query returned 1 object

refer:        [whois.arin.net](http://whois.arin.net/)

inetnum:      152.0.0.0 - 152.255.255.255
organisation: Administered by ARIN
status:       LEGACY

whois:        [whois.arin.net](http://whois.arin.net/)

changed:      1993-05
source:       IANA

# [whois.arin.net](http://whois.arin.net/)

NetRange:       152.51.0.0 - 152.51.255.255
CIDR:           [152.51.0.0/16](http://152.51.0.0/16)
NetName:        GLAXOSMITHKLINE
NetHandle:      NET-152-51-0-0-1
Parent:         NET152 (NET-152-0-0-0-0)
NetType:        Direct Allocation
OriginAS:
Organization:   GlaxoSmithKline (SBC-92)
RegDate:        1991-06-07
Updated:        2021-12-14
Ref:            https://rdap.arin.net/registry/ip/152.51.0.0

OrgName:        GlaxoSmithKline
OrgId:          SBC-92
Address:        5 Crescent Drive
City:           Philadelphia
StateProv:      PA
PostalCode:     19112
Country:        US
RegDate:        2007-03-13
Updated:        2021-02-10
Ref:            https://rdap.arin.net/registry/entity/SBC-92

OrgTechHandle: GSKTE-ARIN
OrgTechName:   GSK-TECH
OrgTechPhone:  [+1-610-962-4025](tel:(610)%20962-4025)
OrgTechEmail:  [addrmgmt@gsk.com](mailto:addrmgmt@gsk.com)
OrgTechRef:    https://rdap.arin.net/registry/entity/GSKTE-ARIN

OrgAbuseHandle: GSKCS-ARIN
OrgAbuseName:   GSK-CSIR
OrgAbusePhone:  [+1-610-962-4048](tel:(610)%20962-4048)
OrgAbuseEmail:  [csir@gsk.com](mailto:csir@gsk.com)
OrgAbuseRef:    https://rdap.arin.net/registry/entity/GSKCS-ARIN

The following is a comprehensive count of the service executions and the associated endpoint request that timed out during 12:55:46 and 13:09:50 which caused this alarm

CloudWatch Logs Insights
region: us-east-1
log-group-names: /aws/lambda/azul-service-prod
start-time: 2024-01-11T20:55:46.000Z
end-time: 2024-01-11T21:09:50.000Z
query-string:

  fields @timestamp, @message
| filter @requestId like /6e9e5735|af6298cf|b3e29164|c0e9ffc4|8dbdf814|f4d73c3a|2ef006dc|620bbfad|6363214e|552265ad|7ad0dfe4|589df13c|0931845d|a384aae5|57180d3f|3e9067bc|c913fe3b|002c5d1c|9022a4be|d4263dd0|28fb7c5c|7147aeec|bb83145a|e8b50409|df94e347|c66648eb|ab9aef15|72b79cc7|7158aac1|c23f826f|f754ceef|a3b3cb04|162dbc4f|0b439e28|ae625dc7|31084dfe|b99c45de|38c40734|ae43fca3|59aeb62e|2bbbe543|ac471505|1ee86c43|2f5cf791|778bf8b6|bb40f904|704753c1|2fff3182|1b691fd1|3efee156|70e8e580|69341fd8|d9eff869|fedbedc9|0bc53a0e|63bdc1f4|b1870a68|3e4fcdf7|9f0960e3|cf0055ad|9bfc6edb|de01c6e6|832353e0|ea1e9172|85a36375|99f7b22f|a2188d49|644494cc|f8d422db|c68f074f|3192a652|7f3c2a83|8b6760be|cb2dfdf0|7384402c|33635c2b|e23f6329|594a42c1|93c2856b|808dbd4c|12c6017c|80d167ef|71294e7e|e9cc3a3d|e869cbdb|8ab12a90|144ccdcf|573a9212|6f6fead8|e681a0c3|9406e645|4a2d97d3|f45955c9|dfbb4fde|bae9243b|298c37a0|03704e95|af7bbc7f|aa0cf04d|4ceaed70|6a44d518|752c4d32|96859c95|2c6a74db|b8b1f47b|d4659c96|8a2b32cd|00375f55|3b9e3c09|e7f0cde2|fcc16958|75862a42|154f3b93|f98e81af|3d359633|eabd1168|b69c9616|b529908b|b47ded7f|d57833a7|c1b080e2|b276efe7|d2c6ae50|89ae25c5|6f00948f|aa815900|e692206a|85851601|5ceabf90|ec0ebcad|19d7051f|6282cbfc|4f25f581|bca730bd|503d4e63|ab3d630/
| filter @message like /elasticsearch\t… with request body|START|END|REPORT|Task|Received/
| filter @message like /Received/
| parse @message "Received * request for '*'" as method, endpoint
| parse @message "Task*" as timeout
| parse `headers.x-forwarded-for` '*,*' IP, lambda
| stats count(*) as count by IP, method, endpoint
| sort count desc

IP method endpoint count
152.51.48.1 PUT /fetch/manifest/files 122
34.67.221.52 GET /index/projects 4
138.51.92.189 GET /index/projects/3089d311-f9ed-44dd-bb10-397059bad4dc 3
35.173.69.86 GET /index/projects 1
52.3.190.111 HEAD /index/files 1
52.3.190.111 HEAD /index/projects 1
134.68.248.202 GET /index/projects 1
134.68.248.202 GET /index/summary 1
134.68.248.202 GET /index/files 1
134.68.248.202 GET /index/samples 1

dsotirho-ucsc commented 10 months ago

Add a WAF ACL rule that applies the azul:expensive label to POST requests, and that counts them. Split the existing rate limiting rule into two rules, one for requests labeled azul:expensive with a limit of 10 per 5 minutes, and another one for all other requests with the existing limit of 1000 per 5min.

The minimum allowed rate is 100 (per 5 minutes). We will implement the azul:expensive WAF rule with a rate of 100, and also limit the manifest Lambda concurrency to 10.