EZID users get 5XX errors occasionally when the system is overloaded. While discussing the 502 errors the Merritt team reported, we came up with some ideas/areas that may help improving EZID performance.
5XX error code:
502 - Bad Gateway error
503 - Service Unavailable
504 - Gateway timeout
Outcomes from Mark, Ashley and Jing's meeting on 8/3:
Action items:
Write custom WAL rules to reduce malicious requests - Jing with Ashley's support #452
Replicate 502 error in the EZID stage environment - Mark & Jing (targeting the minting operation that requires read/write access to both the Berkeley DB and MySQL) #453
Refactoring Merritt 502 error handling - Mark (re-try minting operation when receiving 502 error)
Migrate berkeley DB to MySQL - Jing
Develop a testing/evaluating process before proceeding to the following options - Jing with Ashley's support - Performed load tests using Locust against ezid-dev/stg/prd; documented test results (2023-08)
Adjust Apache/mod_wsgi rate limiting
Adjust mod_wsgi keep-alive and ALB timeout settings
Increase Apache concurrent requests limit
Upgrade EC2 instance #451
Explore other AWS tools/technologies on request limit control such as API gateway throttling settings.
refactor EZID search function to limit results size and reduce memory usage on RDS (#446)
EZID users get 5XX errors occasionally when the system is overloaded. While discussing the 502 errors the Merritt team reported, we came up with some ideas/areas that may help improving EZID performance.
5XX error code:
Outcomes from Mark, Ashley and Jing's meeting on 8/3: Action items:
Originally posted by @jsjiang in https://github.com/CDLUC3/ezid/issues/161#issuecomment-1664690982