♻️ Increase FPM work capacity

petertgiles commented 4 months ago

♻️ Debt/Refactor

We have an FPM configuration in our deployment to specify the number of FPM workers, worker life time, and watchdog timer cutoffs. Lately we've been running out of FPM workers pretty regularly causing the site to become unresponsive. We're working to improve the efficiency of our site but lets boost the work capacity in the meantime.

🙋‍♀️ Proposed Solution

Increase pm.max_children from 10 to X Increase request_terminate_timeout from 60s to Y

✅ Acceptance Criteria

[ ] FPM work capacity boosted

petertgiles commented 4 months ago

I think my recommendation would be to increase pm.max_children to 20 and not change request_terminate_timeout. If we want we can probably test these on the fly by SSH'ing into the app service, changing them, then restarting FPM.

brindasasi commented 4 months ago

Do lot of data in Dev with faker and try it out with the above suggestions. Assessment Step tracker , firing of notifications can be tested in dev for this.

petertgiles commented 4 months ago

In Docker you can restart FPM with this command: pkill -o -USR2 php-fpm Essentially, send the "USR2" signal to the oldest process called php-fpm, which is the master process. If you run ps ax before and after you'll see the master process PID stays the same while the four workers are new.

I've never tried this in Azure, though.

petertgiles commented 4 months ago

Hmm, someone should probably document this stuff? 😬

To check the FPM status while logged into the server: curl 127.0.0.1:8080/fpm-status Not available outside of server.

brindasasi commented 3 months ago

So I calculated average memory per process and I got 45 MB

https://serverfault.com/questions/863238/check-an-average-memory-usage-by-single-php-fpm-process

Total available memory on Dev : 1215 Required fpm workers = 1215/45 = 27

Recommended by Peter : 20

I don't mind going upto 20 and increase if we need more.

brindasasi commented 3 months ago

### pm.max_children = 10

Benchmarking localhost (be patient) Completed 1000 requests Completed 2000 requests Completed 3000 requests Completed 4000 requests Completed 5000 requests Completed 6000 requests Completed 7000 requests Completed 8000 requests Completed 9000 requests Completed 10000 requests Finished 10000 requests

Server Software: nginx/1.24.0 Server Hostname: localhost Server Port: 8000

Document Path: / Document Length: 4989 bytes

Concurrency Level: 20 Time taken for tests: 6.503 seconds Complete requests: 10000 Failed requests: 0 Total transferred: 52490000 bytes HTML transferred: 49890000 bytes Requests per second: 1537.83 [#/sec] (mean) Time per request: 13.005 [ms] (mean) Time per request: 0.650 [ms] (mean, across all concurrent requests) Transfer rate: 7882.87 [Kbytes/sec] received

Connection Times (ms) min mean[+/-sd] median max Connect: 0 0 0.2 0 10 Processing: 3 13 7.3 11 106 Waiting: 3 13 7.3 11 106 Total: 3 13 7.4 11 107

Percentage of the requests served within a certain time (ms) 50% 11 66% 14 75% 16 80% 17 90% 21 95% 26 98% 32 99% 37 100% 107 (longest request)

### pm.max_children = 20

Benchmarking localhost (be patient) Completed 1000 requests Completed 2000 requests Completed 3000 requests Completed 4000 requests Completed 5000 requests Completed 6000 requests Completed 7000 requests Completed 8000 requests Completed 9000 requests Completed 10000 requests Finished 10000 requests

Server Software: nginx/1.24.0 Server Hostname: localhost Server Port: 8000

Document Path: / Document Length: 4989 bytes

Concurrency Level: 20 Time taken for tests: 7.635 seconds Complete requests: 10000 Failed requests: 0 Total transferred: 52490000 bytes HTML transferred: 49890000 bytes Requests per second: 1309.70 [#/sec] (mean) Time per request: 15.271 [ms] (mean) Time per request: 0.764 [ms] (mean, across all concurrent requests) Transfer rate: 6713.48 [Kbytes/sec] received

Connection Times (ms) min mean[+/-sd] median max Connect: 0 0 0.4 0 25 Processing: 3 15 45.5 11 1031 Waiting: 3 15 45.5 11 1031 Total: 3 15 45.5 12 1031

Percentage of the requests served within a certain time (ms) 50% 12 66% 14 75% 16 80% 18 90% 22 95% 26 98% 31 99% 35 100% 1031 (longest request)

brindasasi commented 3 months ago

Results Comparison

_With pm.maxchildren = 10 Time taken for tests: 6.503 seconds Requests per second: 1537.83 [#/sec] (mean) Time per request: 13.005 [ms] (mean) Transfer rate: 7882.87 [Kbytes/sec] Connection Times (ms)

50%: 11 ms 90%: 21 ms 99%: 37 ms

_With pm.maxchildren = 20 Time taken for tests: 7.635 seconds Requests per second: 1309.70 [#/sec] (mean) Time per request: 15.271 [ms] (mean) Transfer rate: 6713.48 [Kbytes/sec]

brindasasi commented 3 months ago

Result isn't very magical when just increasing fpm workers from 10 to 20. Let me play around with other settings as well

brindasasi commented 3 months ago

_pm.max_children = 17 pm.start_servers = 3 pm.min_spare_servers = 2 pm.max_spareservers = 4

Benchmarking localhost (be patient) Completed 1000 requests Completed 2000 requests Completed 3000 requests Completed 4000 requests Completed 5000 requests Completed 6000 requests Completed 7000 requests Completed 8000 requests Completed 9000 requests Completed 10000 requests Finished 10000 requests

Server Software: nginx/1.24.0 Server Hostname: localhost Server Port: 8000

Document Path: / Document Length: 4989 bytes

Concurrency Level: 20 Time taken for tests: 7.600 seconds Complete requests: 10000 Failed requests: 0 Total transferred: 52490000 bytes HTML transferred: 49890000 bytes Requests per second: 1315.74 [#/sec] (mean) Time per request: 15.201 [ms] (mean) Time per request: 0.760 [ms] (mean, across all concurrent requests) Transfer rate: 6744.44 [Kbytes/sec] received

Connection Times (ms) min mean[+/-sd] median max Connect: 0 0 0.4 0 22 Processing: 3 15 8.5 13 92 Waiting: 3 15 8.5 13 91 Total: 3 15 8.5 13 92

Percentage of the requests served within a certain time (ms) 50% 13 66% 16 75% 19 80% 21 90% 26 95% 31 98% 39 99% 45 100% 92 (longest request)

GCTC-NTGC / gc-digital-talent