circus-tent / circus

A Process & Socket Manager built with zmq
http://circus.readthedocs.org/
Other
1.55k stars 257 forks source link

Setting numprocesses dynamically based on number of CPUs? #1145

Open fgimian opened 4 years ago

fgimian commented 4 years ago

Hey there again, sorry for posting another issue but this is the only thing I have left to understand 😄

I can't seem to see an easy way to set numprocesses dynamically based on number of CPUs available. I'm quite new to circus and even after reading the developer docs, I still don't seem to see a way to add this via a plugin or similar.

I'm really hoping for a way to say something like:

numprocesses = $(circus.numcpus * 1.5)

My alternative would be to compute this outside circus, store it in an environment variable and then use the env variable, but I was hoping for something built in.

Is there any way to do this currently or can you suggest the best approach?

Huge thanks! Fotis

biozz commented 4 years ago

Hi @fgimian!

As far as I know there is no built-in feature to calculate number of processes on the fly inside configuration file. It is a configuration file after all.

Your second approach is on the point, this is how people usually use it:

numprocesses = $(circus.env.MY_CPU_NUMBER)

Another option would be to use circus programmatically and develop something similar to what is described in the docs.

I don't think there is a need to do calculations inside configuration file, but maybe you have more examples and ideas which we can use as a feature proposal?

fgimian commented 4 years ago

Thank you so much for the reply and help. Setting the number of workers based on CPU count is a rather common practice with servers such as Gunicorn. This simply means that apps can move to different hardware and scale automatically to the required number of cores. Of course, this is especially true for WSGI and ASGI servers in Python land where a single process can only address one CPU due to the GIL.

My current use case is deployment of a FastAPI application. You can see an example Gunicorn config provided by the FastAPI team which dynamically configures workers.

Similarly, the official Gunicorn docs recommend a formula for number of workes based on cores, in their case:

(2 x $num_cores) + 1

In saying all of this, Circus really does make it very easy to do all of this with a little Python entrypoint script. It just took me a while to figure out that I had to manually setup logging, so here's the final script for anyone interested:

from multiprocessing import cpu_count

from circus import get_arbiter, logger
from circus.sockets import CircusSocket
from circus.util import configure_logger

logging_config_path = "logging.conf.yaml"

web_workers = cpu_count()
web_watcher = {
    "name": "web",
    "cmd": "uvicorn",
    "args": ["--fd", "$(circus.sockets.web)", "--log-config", logging_config_path, "main:app"],
    "copy_env": True,
    "use_sockets": True,
    "numprocesses": web_workers,
    "graceful_timeout": 120,
}
web_socket = CircusSocket(name="web", host="0.0.0.0", port=6000)

# Create the arbiter
arbiter = get_arbiter(
    watchers=[web_watcher], sockets=[web_socket], loggerconfig=logging_config_path
)

# Configure the logger
loglevel = arbiter.loglevel or "info"
logoutput = arbiter.logoutput or "-"
loggerconfig = arbiter.loggerconfig or None
configure_logger(logger, loglevel, logoutput, loggerconfig)

try:
    arbiter.start()
finally:
    arbiter.stop()

Or a more low level script that uses the Arbiter and Watcher classes directly:

from multiprocessing import cpu_count

from circus import logger
from circus.arbiter import Arbiter
from circus.watcher import Watcher
from circus.sockets import CircusSocket
from circus.util import (
    DEFAULT_ENDPOINT_DEALER,
    DEFAULT_ENDPOINT_SUB,
    DEFAULT_ENDPOINT_MULTICAST,
    DEFAULT_ENDPOINT_STATS,
    configure_logger,
)

logging_config_path = "logging.conf.yaml"

web_workers = cpu_count()
web_watcher = Watcher(
    name="web",
    cmd="uvicorn",
    args=["--fd", "$(circus.sockets.web)", "--log-config", logging_config_path, "main:app"],
    copy_env=True,
    use_sockets=True,
    numprocesses=4,
    graceful_timeout=120,
)
web_socket = CircusSocket(name="web", host="0.0.0.0", port=6000)

# Create the arbiter
arbiter = Arbiter(
    watchers=[web_watcher],
    endpoint=DEFAULT_ENDPOINT_DEALER,
    pubsub_endpoint=DEFAULT_ENDPOINT_SUB,
    stats_endpoint=DEFAULT_ENDPOINT_STATS,
    multicast_endpoint=DEFAULT_ENDPOINT_MULTICAST,
    sockets=[web_socket],
    loggerconfig=logging_config_path,
)

# Configure the logger
loglevel = arbiter.loglevel or "info"
logoutput = arbiter.logoutput or "-"
loggerconfig = arbiter.loggerconfig or None
configure_logger(logger, loglevel, logoutput, loggerconfig)

try:
    arbiter.start()
finally:
    arbiter.stop()

Both of these accomplish exactly what I need personally. I'm not sure how viable it would be to add more advanced expressions and a few more pre-computed circus variables to make this possible via an INI file. Gunicorn ultimately has the advantage of supporting a Python script for configuration which may be a better approach.

e.g.

[circus]
loggerconfig = logging.conf.yaml

[watcher:web]
cmd = uvicorn --fd $(circus.sockets.web) --log-config logging.conf.yaml main:app
copy_env = True
use_sockets = True
numprocesses = 4
graceful_timeout = 120

[socket:web]
host = 0.0.0.0
port = 6000

Could be represented as:

loggerconfig = "logging.conf.yaml"

watchers = {
    "web": {
        "cmd": "uvicorn --fd $(circus.sockets.web) --log-config logging.conf.yaml mai::app",
        "copy_env": True,
        "use_sockets": True,
        "numprocesses": 4,
        "graceful_timeout": 120,
    }
}

sockets = {
    "web": {
        "host": "0.0.0.0",
        "port": 6000,
    }
}

Circus would then support this as a more advanced configuration file format as compared to the simpler INI version. It would import this module and configure Circus based on defined variables in the module, similar to Gunicorn. This would allow us to do something like this:

from multiprocessing import cpu_count

loggerconfig = "logging.conf.yaml"

watchers = {
    "web": {
        "cmd": "uvicorn",
        "args": ["--fd", "$(circus.sockets.web)", "--log-config", loggerconfig, "mai::app",
        "copy_env": True,
        "use_sockets": True,
        "numprocesses": cpu_count() * 2 + 1,
        "graceful_timeout": 120,
    }
}

sockets = {
    "web": {
        "host": "0.0.0.0",
        "port": 6000,
    }
}

Just some thoughts, what do you think?

Huge thanks again Fotis