fractal-analytics-platform / fractal-server

Fractal backend
https://fractal-analytics-platform.github.io/fractal-server/
BSD 3-Clause "New" or "Revised" License
11 stars 3 forks source link

Review gunicorn/OS load-balancing #1937

Open tcompa opened 4 days ago

tcompa commented 4 days ago

For the moment this is just a placeholder with relevant links:

mfranzon commented 3 days ago

Takes from a deeper review:

A single trivial endpoint (/api/alive/) has been tested, which returns the pid of the process on which it is being executed, which corresponds to a gunicorn worker.

As illustrated in the issues of the previous message, gunicorn does NOT introduce any load balancing activity but declines responsibility to the operating system scheduler.

Testing 5000 calls with 12 workers on a local PC (ubuntu22) we observed that:

Further considerations must be made:

No Fix

PID Statistics:
PID: 124309, Count: 3
PID: 124305, Count: 9
PID: 124306, Count: 14
PID: 124304, Count: 9
PID: 124311, Count: 9
PID: 124314, Count: 16
PID: 124312, Count: 8
PID: 124308, Count: 8
PID: 124310, Count: 7
PID: 124313, Count: 10
PID: 124307, Count: 5
PID: 124303, Count: 2

With Fix

PID Statistics:
PID: 168880, Count: 14
PID: 168887, Count: 12
PID: 168882, Count: 5
PID: 168885, Count: 8
PID: 168888, Count: 10
PID: 168927, Count: 8
PID: 168890, Count: 6
PID: 168884, Count: 11
PID: 168889, Count: 6
PID: 168881, Count: 8
PID: 168883, Count: 6
PID: 168886, Count: 6
mfranzon commented 3 days ago

More on this (@tcompa):

In the current state (no patch), all sockets are on the same port. In this situation, the OS contacts the different sockets with non-homogeneous frequencies. By adding the gunicorn patch (see previous comment) and the SO_REUSEPORT option, the N sockets are on N different ports. In this situation, the OS contacts the different sockets in a seemingly random - and therefore homogeneous - manner.

Example current state:

$ lsof -i | grep 8000
gunicorn  76496 tommaso    5u  IPv4 466962      0t0  TCP localhost:8000 (LISTEN)   # master gunicorn
gunicorn  76500 tommaso    5u  IPv4 466962      0t0  TCP localhost:8000 (LISTEN)
gunicorn  76501 tommaso    5u  IPv4 466962      0t0  TCP localhost:8000 (LISTEN)
gunicorn  76502 tommaso    5u  IPv4 466962      0t0  TCP localhost:8000 (LISTEN)
gunicorn  76503 tommaso    5u  IPv4 466962      0t0  TCP localhost:8000 (LISTEN)

Example with patch and --reuse-port

$ lsof -i | grep 8000
gunicorn  75823 tommaso    6u  IPv4 463247      0t0  TCP localhost:8000 (LISTEN)
gunicorn  75824 tommaso    5u  IPv4 459620      0t0  TCP localhost:8000 (LISTEN)
gunicorn  75825 tommaso    5u  IPv4 464099      0t0  TCP localhost:8000 (LISTEN)
gunicorn  75827 tommaso    5u  IPv4 457392      0t0  TCP localhost:8000 (LISTEN)
tcompa commented 16 hours ago

Current TLDR: