Open achave11-ucsc opened 1 year ago
Spike for design.
/manifest/files
and /fetch/manifest/files
to POST instead of GET.Add a WAF ACL rule that applies the azul:expensive
label to POST requests, and that counts them. Split the existing rate limiting rule into two rules, one for requests labeled azul:expensive
with a limit of 10 per 5 minutes, and another one for all other requests with the existing limit of 1000 per 5min.
This issue made an appearance by a user making 122 request to the manifest endpoint, overwhelming the ES instance and causing many service execution timeouts.
[
{
"identity_sourceIp": "152.51.48.1",
"service_endpoint": "/fetch/manifest/files",
"e_msg": "-",
"gw_logs": "",
"count": "2"
},
{
"identity_sourceIp": "152.51.48.1",
"service_endpoint": "/fetch/manifest/files",
"e_msg": "Endpoint request timed out",
"gw_logs": "",
"count": "122"
}
]
whois 152.51.48.1
% IANA WHOIS server
% for more information on IANA, visit http://www.iana.org/
% This query returned 1 object
refer: [whois.arin.net](http://whois.arin.net/)
inetnum: 152.0.0.0 - 152.255.255.255
organisation: Administered by ARIN
status: LEGACY
whois: [whois.arin.net](http://whois.arin.net/)
changed: 1993-05
source: IANA
# [whois.arin.net](http://whois.arin.net/)
NetRange: 152.51.0.0 - 152.51.255.255
CIDR: [152.51.0.0/16](http://152.51.0.0/16)
NetName: GLAXOSMITHKLINE
NetHandle: NET-152-51-0-0-1
Parent: NET152 (NET-152-0-0-0-0)
NetType: Direct Allocation
OriginAS:
Organization: GlaxoSmithKline (SBC-92)
RegDate: 1991-06-07
Updated: 2021-12-14
Ref: https://rdap.arin.net/registry/ip/152.51.0.0
OrgName: GlaxoSmithKline
OrgId: SBC-92
Address: 5 Crescent Drive
City: Philadelphia
StateProv: PA
PostalCode: 19112
Country: US
RegDate: 2007-03-13
Updated: 2021-02-10
Ref: https://rdap.arin.net/registry/entity/SBC-92
OrgTechHandle: GSKTE-ARIN
OrgTechName: GSK-TECH
OrgTechPhone: [+1-610-962-4025](tel:(610)%20962-4025)
OrgTechEmail: [addrmgmt@gsk.com](mailto:addrmgmt@gsk.com)
OrgTechRef: https://rdap.arin.net/registry/entity/GSKTE-ARIN
OrgAbuseHandle: GSKCS-ARIN
OrgAbuseName: GSK-CSIR
OrgAbusePhone: [+1-610-962-4048](tel:(610)%20962-4048)
OrgAbuseEmail: [csir@gsk.com](mailto:csir@gsk.com)
OrgAbuseRef: https://rdap.arin.net/registry/entity/GSKCS-ARIN
The following is a comprehensive count of the service executions and the associated endpoint request that timed out during 12:55:46 and 13:09:50 which caused this alarm
CloudWatch Logs Insights
region: us-east-1
log-group-names: /aws/lambda/azul-service-prod
start-time: 2024-01-11T20:55:46.000Z
end-time: 2024-01-11T21:09:50.000Z
query-string:
fields @timestamp, @message
| filter @requestId like /6e9e5735|af6298cf|b3e29164|c0e9ffc4|8dbdf814|f4d73c3a|2ef006dc|620bbfad|6363214e|552265ad|7ad0dfe4|589df13c|0931845d|a384aae5|57180d3f|3e9067bc|c913fe3b|002c5d1c|9022a4be|d4263dd0|28fb7c5c|7147aeec|bb83145a|e8b50409|df94e347|c66648eb|ab9aef15|72b79cc7|7158aac1|c23f826f|f754ceef|a3b3cb04|162dbc4f|0b439e28|ae625dc7|31084dfe|b99c45de|38c40734|ae43fca3|59aeb62e|2bbbe543|ac471505|1ee86c43|2f5cf791|778bf8b6|bb40f904|704753c1|2fff3182|1b691fd1|3efee156|70e8e580|69341fd8|d9eff869|fedbedc9|0bc53a0e|63bdc1f4|b1870a68|3e4fcdf7|9f0960e3|cf0055ad|9bfc6edb|de01c6e6|832353e0|ea1e9172|85a36375|99f7b22f|a2188d49|644494cc|f8d422db|c68f074f|3192a652|7f3c2a83|8b6760be|cb2dfdf0|7384402c|33635c2b|e23f6329|594a42c1|93c2856b|808dbd4c|12c6017c|80d167ef|71294e7e|e9cc3a3d|e869cbdb|8ab12a90|144ccdcf|573a9212|6f6fead8|e681a0c3|9406e645|4a2d97d3|f45955c9|dfbb4fde|bae9243b|298c37a0|03704e95|af7bbc7f|aa0cf04d|4ceaed70|6a44d518|752c4d32|96859c95|2c6a74db|b8b1f47b|d4659c96|8a2b32cd|00375f55|3b9e3c09|e7f0cde2|fcc16958|75862a42|154f3b93|f98e81af|3d359633|eabd1168|b69c9616|b529908b|b47ded7f|d57833a7|c1b080e2|b276efe7|d2c6ae50|89ae25c5|6f00948f|aa815900|e692206a|85851601|5ceabf90|ec0ebcad|19d7051f|6282cbfc|4f25f581|bca730bd|503d4e63|ab3d630/
| filter @message like /elasticsearch\t… with request body|START|END|REPORT|Task|Received/
| filter @message like /Received/
| parse @message "Received * request for '*'" as method, endpoint
| parse @message "Task*" as timeout
| parse `headers.x-forwarded-for` '*,*' IP, lambda
| stats count(*) as count by IP, method, endpoint
| sort count desc
IP | method | endpoint | count |
---|---|---|---|
152.51.48.1 | PUT | /fetch/manifest/files | 122 |
34.67.221.52 | GET | /index/projects | 4 |
138.51.92.189 | GET | /index/projects/3089d311-f9ed-44dd-bb10-397059bad4dc | 3 |
35.173.69.86 | GET | /index/projects | 1 |
52.3.190.111 | HEAD | /index/files | 1 |
52.3.190.111 | HEAD | /index/projects | 1 |
134.68.248.202 | GET | /index/projects | 1 |
134.68.248.202 | GET | /index/summary | 1 |
134.68.248.202 | GET | /index/files | 1 |
134.68.248.202 | GET | /index/samples | 1 |
Add a WAF ACL rule that applies the
azul:expensive
label to POST requests, and that counts them. Split the existing rate limiting rule into two rules, one for requests labeledazul:expensive
with a limit of 10 per 5 minutes, and another one for all other requests with the existing limit of 1000 per 5min.
The minimum allowed rate is 100 (per 5 minutes). We will implement the azul:expensive
WAF rule with a rate of 100, and also limit the manifest Lambda concurrency to 10.