camptocamp / c2cwsgiutils

BSD 2-Clause "Simplified" License
6 stars 3 forks source link
monitoring pyramid python

Camptocamp WSGI utilities

This is a Python 3 library providing common tools for Camptocamp WSGI applications:

Also provide tools for writing acceptance tests:

As an example on how to use it in an application provided by a Docker image, you can look at the test application in acceptance_tests/app. To see how to test such an application, look at acceptance_tests/tests.

Install

Custom Docker image (from PYPI library)

Here we didn't do a minimal install of c2cwsgiutils, but be put in place everything needed to monitor the application in integration and production environment.

The library is available in PYPI: https://pypi.python.org/pypi/c2cwsgiutils

Copy and adapt these template configuration file into your project:

You should install c2cwsgiutils with the tool you use to manage your pip dependencies.

In the Dockerfile you should add the following lines:

# Generate the version file.
RUN c2cwsgiutils-genversion $(git rev-parse HEAD)

CMD ["gunicorn", "--paste=/app/production.ini"]

# Default values for the environment variables
ENV \
  DEVELOPMENT=0 \
  SQLALCHEMY_POOL_RECYCLE=30 \
  SQLALCHEMY_POOL_SIZE=5 \
  SQLALCHEMY_MAX_OVERFLOW=25 \
  SQLALCHEMY_SLAVE_POOL_RECYCLE=30 \
  SQLALCHEMY_SLAVE_POOL_SIZE=5 \
  SQLALCHEMY_SLAVE_MAX_OVERFLOW=25\
  LOG_TYPE=console \
  OTHER_LOG_LEVEL=WARNING \
  GUNICORN_LOG_LEVEL=WARNING \
  SQL_LOG_LEVEL=WARNING \
  C2CWSGIUTILS_LOG_LEVEL=WARNING \
  LOG_LEVEL=INFO

Add in your main function.

config.include("c2cwsgiutils.pyramid")
dbsession = c2cwsgiutils.db.init(config, "sqlalchemy", "sqlalchemy_slave")

config.scan(...)

# Initialize the health checks
health_check = c2cwsgiutils.health_check.HealthCheck(config)
health_check.add_db_session_check(dbsession)
health_check.add_alembic_check(dbsession, "/app/alembic.ini", 1)

The related environment variables:

Those environment variables can be useful for investigation on production environments.

Docker (deprecated)

Or (deprecated) as a base Docker image: camptocamp/c2cwsgiutils:release_5 or ghcr.io/camptocamp/c2cwsgiutils:release_5

If you need an image with a smaller foot print, use the tags prefixed with -light. Those are without GDAL and without the build tools.

We deprecate the Docker image because:

General config

In general, configuration can be done both with environment variables (taken first) or with entries in the production.ini file.

You can configure the base URL for accessing the views provided by c2cwsgiutils with an environment variable named C2C_BASE_PATH or in the production.ini file with a property named c2c.base_path.

A few REST APIs are added and can be seen with this URL: {C2C_BASE_PATH}.

Some APIs are protected by a secret. This secret is specified in the C2C_SECRET variable or c2c.secret property. It is either passed as the secret query parameter or the X-API-Key header. Once accessed with a good secret, a cookie is stored and the secret can be omitted.

An alternative of using C2C_SECRET is to use an authentication on GitHub, create the GitHub application.

Configure the json renderers with the C2C_JSON_PRETTY_PRINT and C2C_JSON_SORT_KEYS environment variables or c2c.json.pretty_printand c2c.json.sort_keys properties. Default is false.

Then it will redirect the user to the github authentication form if not already authenticated (using C2C_AUTH_GITHUB_CLIENT_ID, C2C_AUTH_GITHUB_CLIENT_SECRET and C2C_AUTH_GITHUB_SCOPE).

Then we will check if the user is allowed to access to the application, for that we check if the user has enough right on a GitHub repository (using C2C_AUTH_GITHUB_REPOSITORY and C2C_AUTH_GITHUB_REPOSITORY_ACCESS_TYPE).

Finally we store the session information in an encrypted cookie (using C2C_AUTH_SECRET and C2C_AUTH_COOKIE).

Configuration details:

Using the environment variable C2C_AUTH_GITHUB_REPOSITORY or the config key c2c.auth.github.repository to define the related GitHub repository (required).

Using the environment variable C2C_AUTH_GITHUB_ACCESS_TYPE or the config key c2c.auth.github.access_type to define the type of required access can be pull, push or admin (default is push)

Using the environment variable C2C_AUTH_GITHUB_CLIENT_ID or the config key c2c.auth.github.client_id to define the GitHub application ID (required)

Using the environment variable C2C_AUTH_GITHUB_CLIENT_SECRET or the config key c2c.auth.github.client_secret to define the GitHub application secret (required)

Using the environment variable C2C_AUTH_GITHUB_SCOPE or the config key c2c.auth.github.scope to define the GitHub scope (default is repo), see GitHub documentation

Using the environment variable C2C_AUTH_GITHUB_SECRET or the config key c2c.auth.github.auth.secret to define the used secret for JWD encryption (required, with a length at least of 16)

Using the environment variable C2C_AUTH_GITHUB_COOKIE or the config key c2c.auth.github.auth.cookie to define the used cookie name (default is c2c-auth-jwt)

Using the environment variable C2C_AUTH_GITHUB_AUTH_URL or the config key c2c.auth.github.auth_url to define the GitHub auth URL (default is https://github.com/login/oauth/authorize)

Using the environment variable C2C_AUTH_GITHUB_TOKEN_URL or the config key c2c.auth.github.token_url to define the GitHub auth URL (default is https://github.com/login/oauth/access_token)

Using the environment variable C2C_AUTH_GITHUB_USER_URL or the config key c2c.auth.github.user_url to define the GitHub auth URL (default is https://api.github.com/user)

Using the environment variable C2C_AUTH_GITHUB_REPO_URL or the config key c2c.auth.github.repo_url to define the GitHub auth URL (default is https://api.github.com/repo)

Using the environment variable C2C_AUTH_GITHUB_PROXY_URL or the config key c2c.auth.github.auth.proxy_url to define a redirect proxy between GitHub and our application to be able to share an OAuth2 application on GitHub (default is no proxy). Made to work with this proxy.

Using the environment variable C2C_USE_SESSION or the config key c2c.use_session to define if we use a session. Currently, we can use the session to store a state, used to prevent CSRF, during OAuth2 login (default is false)

Pyramid

All the environment variables are usable in the configuration file using stuff like %(ENV_NAME)s.

To enable most of the features of c2cwsgiutils, you need to add this line to your WSGI main:

import c2cwsgiutils.pyramid
config.include(c2cwsgiutils.pyramid.includeme)

Error catching views will be put in place to return errors as JSON.

A custom loader is provided to run pyramid scripts against configuration files containing environment variables:

proutes c2c://production.ini      # relative path
proutes c2c:///app/production.ini # absolute path

A filter is automatically installed to handle the HTTP headers set by common proxies and have correct values in the request object (request.client_addr, for example). This filter is equivalent to what the PasteDeploy#prefix (minus the prefix part) does, but supports newer headers as well (Forwarded). If you need to prefix your routes, you can use the route_prefix parameter of the Configurator constructor.

Logging

Two new logging backends are provided:

Look at the logging configuration part of acceptance_tests/app/production.ini for paste and commands line.

The logging configuration is imported automatically by gunicorn, it is possible to visualize the dict config by setting the environment variable DEBUG_LOGCONFIG=1.

You can enable a view to configure the logging level on a live system using the C2C_LOG_VIEW_ENABLED environment variable. Then, the current status of a logger can be queried with a GET on {C2C_BASE_PATH}/logging/level?secret={C2C_SECRET}&name={logger_name} and can be changed with {C2C_BASE_PATH}/logging/level?secret={C2C_SECRET}&name={logger_name}&level={level}. Overrides are stored in Redis, if C2C_REDIS_URL (c2c.redis_url) or C2C_REDIS_SENTINELS is configured.

Database maintenance

You can enable a view to force usage of the slave engine using the C2C_DB_MAINTENANCE_VIEW_ENABLED environment variable. Then, the database can be made "readonly" with {C2C_BASE_PATH}/db/maintenance?secret={C2C_SECRET}&readonly=true. The current state is stored in Redis, if C2C_REDIS_URL (c2c.redis_url) or C2C_REDIS_SENTINELS is configured.

Request tracking

In order to follow the logs generated by a request across all the services (think separate processes), c2cwsgiutils tries to flag averything with a request ID. This field can come from the input as request headers (X-Request-ID, X-Correlation-ID, Request-ID or X-Varnish) or will default to a UUID. You can add an additional request header as source for that by defining the C2C_REQUEST_ID_HEADER environment variable (c2c.request_id_header).

In JSON logging formats, a request_id field is automatically added.

You can enable (disabled by default since it can have a cost) the flagging of the SQL requests as well by setting the C2C_SQL_REQUEST_ID environment variable (or c2c.sql_request_id in the .ini file). This will use the application name to pass along the request id. If you do that, you must include the application name in the PostgreSQL logs by setting log_line_prefix to something like "%a " (don't forget the space).

Then, in your application, it is recommended to transmit the request ID to the external REST APIs. Use the X-Request-ID HTTP header, for example. The value of the request ID is accessible through an added c2c_request_id attribute on the Pyramid Request objects. The requests module is patched to automatically add this header.

The requests module is also patched to monitor requests done without timeout. In that case, you can configure a default timeout with the C2C_REQUESTS_DEFAULT_TIMEOUT environment variable (c2c.requests_default_timeout). If no timeout and no default is specified, a warning is issued.

SQL profiler

The SQL profiler must be configured with the C2C_SQL_PROFILER_ENABLED environment variable. That enables a view to query the status of the profiler ({C2C_BASE_PATH}/sql_profiler?secret={C2C_SECRET}) or to enable/disable it ({C2C_BASE_PATH}/sql_profiler?secret={C2C_SECRET}&enable={1|0}).

If enabled, for each SELECT query sent by SQLAlchemy, another query it done with EXPLAIN ANALYZE prepended to it. The results are sent to the c2cwsgiutils.sql_profiler logger.

Don't enable that on a busy production system. It will kill your performances.

Profiler

C2cwsgiutils provide an easy way to profile an application:

With a decorator:

from c2cwsgiutils.profile import Profiler

@Profile('/my_file.prof')
my_function():
    ...

Or with the with statement:

from c2cwsgiutils.profile import Profiler

with Profile('/my_file.prof'):
    ...

Then open your file with SnakeViz:

docker cp container_name:/my_file.prof .
pip install --user snakeviz
snakeviz my_file.prof

DB sessions

The c2cwsgiutils.db.init allows you to setup a DB session that has two engines for accessing a master/slave PostgresQL setup. The slave engine (read only) will be used automatically for GET and OPTIONS requests and the master engine (read write) will be used for the other queries.

To use that, your production.ini must look like that:

sqlalchemy.url = %(SQLALCHEMY_URL)s
sqlalchemy.pool_recycle = %(SQLALCHEMY_POOL_RECYCLE)s
sqlalchemy.pool_size = %(SQLALCHEMY_POOL_SIZE)s
sqlalchemy.max_overflow = %(SQLALCHEMY_MAX_OVERFLOW)s

sqlalchemy_slave.url = %(SQLALCHEMY_SLAVE_URL)s
sqlalchemy_slave.pool_recycle = %(SQLALCHEMY_SLAVE_POOL_RECYCLE)s
sqlalchemy_slave.pool_size = %(SQLALCHEMY_SLAVE_POOL_SIZE)s
sqlalchemy_slave.max_overflow = %(SQLALCHEMY_SLAVE_MAX_OVERFLOW)s

And your code that initializes the DB connection must look like that:

import c2cwsgiutils.db

def main(config):
    c2cwsgiutils.db.init(config, 'sqlalchemy', 'sqlalchemy_slave', force_slave=[
        "POST /api/hello"
    ])[0]

You can use the force_slave and force_master parameters to override the defaults and force a route to use the master or the slave engine.

Health checks

To enable health checks, you must add some setup in your WSGI main (usually after the DB connections are setup). For example:

from c2cwsgiutils.health_check import HealthCheck

def custom_check(request):
    global not_happy
    if not_happy:
        raise Exception("I'm not happy")
    return "happy"

health_check = HealthCheck(config)
health_check.add_db_session_check(models.DBSession, at_least_one_model=models.Hello)
health_check.add_url_check('http://localhost:8080/api/hello')
health_check.add_custom_check('custom', custom_check, 2)
health_check.add_alembic_check(models.DBSession, '/app/alembic.ini', 3)

Then, the URL {C2C_BASE_PATH}/health_check?max_level=3 can be used to run the health checks and get a report looking like that (in case of error):

{
  "status": 500,
  "successes": {
    "db_engine_sqlalchemy": { "timing": 0.002 },
    "db_engine_sqlalchemy_slave": { "timing": 0.003 },
    "http://localhost/api/hello": { "timing": 0.01 },
    "alembic_app_alembic.ini_alembic": { "timing": 0.005, "result": "4a8c1bb4e775" }
  },
  "failures": {
    "custom": {
      "message": "I'm not happy",
      "timing": 0.001
    }
  }
}

The levels are:

The URL {C2C_BASE_PATH}/health_check?checks=<check_name> can be used to run the health checks on some checks, coma separated list.

When you instantiate the HealthCheck class, two checks may be automatically enabled:

Look at the documentation of the c2cwsgiutils.health_check.HealthCheck class for more information.

SQLAlchemy models graph

A command is provided that can generate Doxygen graphs of an SQLAlchemy ORM model. See acceptance_tests/app/models_graph.py how it's used.

Version information

If the /app/versions.json exists, a view is added ({C2C_BASE_PATH}/versions.json) to query the current version of a app. This file is generated by calling the c2cwsgiutils-genversion [$GIT_TAG] $GIT_HASH command line. Usually done in the Dockerfile of the WSGI application.

Prometheus

Prometheus client is integrated in c2cwsgiutils.

It will work in multi process mode with the limitation listed in the prometheus_client documentation.

To enable it you should provide the C2C_PROMETHEUS_PORT environment variable. For security reason, this port should not be exposed.

We can customize it with the following environment variables:

And you should add in your gunicorn.conf.py:

from prometheus_client import multiprocess

def on_starting(server):
    from c2cwsgiutils import prometheus

    del server

    prometheus.start()

def post_fork(server, worker):
    from c2cwsgiutils import prometheus

    del server, worker

    prometheus.cleanup()

def child_exit(server, worker):
    del server

    multiprocess.mark_process_dead(worker.pid)

In your Dockerfile you should add:

RUN mkdir -p /prometheus-metrics \
    && chmod a+rwx /prometheus-metrics
ENV PROMETHEUS_MULTIPROC_DIR=/prometheus-metrics

Add custom metric collector

See official documentation.

Related to the Unix process.

from c2cwsgiutils import broadcast, prometheus

prometheus.MULTI_PROCESS_COLLECTOR_BROADCAST_CHANNELS.append("prometheus_collector_custom")
broadcast.subscribe("c2cwsgiutils_prometheus_collect_gc", _broadcast_collector_custom)
my_custom_collector_instance = MyCustomCollector()

def _broadcast_collector_custom() -> List[prometheus.SerializedGauge]:
    """Get the collected GC gauges."""

    return prometheus.serialize_collected_data(my_custom_collector_instance)

Related to the host, use that in the gunicorn.conf.py:

def on_starting(server):
    from c2cwsgiutils import prometheus

    del server

    registry = CollectorRegistry()
    registry.register(MyCollector())
    prometheus.start(registry)

Database metrics

Look at the c2cwsgiutils-stats-db utility if you want to generate statistics (gauges) about the row counts.

Usage of metrics

With c2cwsgiutils each instance (Pod) has its own metrics, so we need to aggregate them to have the metrics for the service you probably need to use sum by (<fields>) (<metric>) to get the metric (especially for counters).

Custom scripts

To have the application initialized in a script you should use the c2cwsgiutils.setup_process.bootstrap_application_from_options function.

Example of main function:

def main() -> None:
    parser = argparse.ArgumentParser(description="My scrypt.")
    # Add your argument here
    c2cwsgiutils.setup_process.fill_arguments(parser)
    args = parser.parse_args()
    env = c2cwsgiutils.setup_process.bootstrap_application_from_options(args)
    settings = env["registry"].settings

    # Add your code here

If you need an access to the database you should add:

    engine = c2cwsgiutils.db.get_engine(settings)
    session_factory = c2cwsgiutils.db.get_session_factory(engine)
    with transaction.manager:
        # Add your code here

If you need the database connection without the application context, you can replace:

    env = c2cwsgiutils.setup_process.bootstrap_application_from_options(args)
    settings = env["registry"].settings

by:

    loader = pyramid.scripts.common.get_config_loader(args.config_uri)
    loader.setup_logging(parse_vars(args.config_vars) if args.config_vars else None)
    settings = loader.get_settings()

Debugging

To enable the debugging interface, you must set the C2C_DEBUG_VIEW_ENABLED environment variable. Then you can have dumps of a few things:

To ease local development, the views are automatically reloaded when files change. In addition, the filesystem is mounted by the docker-compose.override.yaml file. Make sure not to use such file / mechanism in production.

Broadcast

Some c2cwsgiutils APIs effect or query the state of the WSGI server. Since only one process out of the 5 (by default) time the number of servers gets a query, only this one will be affected. To avoid that, you can configure c2cwsgiutils to use Redis pub/sub to broadcast those requests and collect the answers.

The impacted APIs are:

The configuration parameters are:

If not configured, only the process receiving the request is impacted.

CORS

To have CORS compliant views, define your views like that:

from c2cwsgiutils import services
hello_service = services.create("hello", "/hello", cors_credentials=True)

@hello_service.get()
def hello_get(request):
    return {'hello': True}

Exception handling

c2cwsgiutils can install exception handling views that will catch any exception raised by the application views and will transform it into a JSON response with a HTTP status corresponding to the error.

You can enable this by setting C2C_ENABLE_EXCEPTION_HANDLING (c2c.enable_exception_handling) to "1".

In development mode (DEVELOPMENT=1), all the details (SQL statement, stacktrace, ...) are sent to the client. In production mode, you can still get them by sending the secret defined in C2C_SECRET in the query.

If you want to use pyramid_debugtoolbar, you need to disable exception handling and configure it like that:

pyramid.includes =
    pyramid_debugtoolbar
debugtoolbar.enabled = true
debugtoolbar.hosts = 0.0.0.0/0
debugtoolbar.intercept_exc = debug
debugtoolbar.show_on_exc_only = true
c2c.enable_exception_handling = 0

JSON pretty print

Some JSON renderers are available:

Both pretty prints the rendered JSON. While this adds significant amount of whitespace, the difference in bytes transmitted on the network is negligible thanks to gzip compression.

The fast_json renderer is using ujson which is faster, but doesn't offer the ability to change the rendering of some types (the default parameter of json.dumps). This will interact badly with papyrus and such.

The cornice versions should be used to avoid the "'JSON' object has no attribute 'render_errors'" error.

Sentry integration

The stacktraces can be sent to a sentry.io service for collection. To enable it, you must set the SENTRY_URL (c2c.sentry_url) to point the the project's public DSN.

A few other environment variables can be used to tune the info sent with each report:

Developer info

You will need docker (>=1.12.0), docker compose and make installed on the machine to play with this project. Check available versions of docker-engine with apt-get policy docker-engine and eventually force install the up-to-date version using a command similar to apt-get install docker-engine=1.12.3-0~xenial.

To lint and test everything, run the following command:

make

Make sure you are strict with the version numbers:

To make a release:

Pserve

Pserve will not set the headers in the environment then if you are behind a reverse proxy, you will have wrong values in client information, you can force them by using the environment variables: C2CWSGIUTILS_FORCE_PROTO, C2CWSGIUTILS_FORCE_HOST C2CWSGIUTILS_FORCE_SERVER_NAME and C2CWSGIUTILS_FORCE_REMOTE_ADDR.

Testing

Screenshots

To test the screenshots, you need to install node with npm, to do that add the following lines in your Dockerfile:

RUN --mount=type=cache,target=/var/lib/apt/lists \
    --mount=type=cache,target=/var/cache,sharing=locked \
    apt-get install --yes --no-install-recommends gnupg \
    && . /etc/os-release \
    && echo "deb https://deb.nodesource.com/node_18.x ${VERSION_CODENAME} main" > /etc/apt/sources.list.d/nodesource.list \
    && curl --silent https://deb.nodesource.com/gpgkey/nodesource.gpg.key | apt-key add - \
    && apt-get update \
    && apt-get install --assume-yes --no-install-recommends 'nodejs=18.*' \
        libx11-6 libx11-xcb1 libxcomposite1 libxcursor1 \
        libxdamage1 libxext6 libxi6 libxtst6 libnss3 libcups2 libxss1 libxrandr2 libasound2 libatk1.0-0 \
        libatk-bridge2.0-0 libpangocairo-1.0-0 libgtk-3.0 libxcb-dri3-0 libgbm1 libxshmfence1

To do the image test call check_screenshot e.g.:

from c2cwsgiutils.acceptance import image

def test_screenshot(app_connection):
    image.check_screenshot(
        app_connection.base_url + "my-path",
        width=800,
        height=600,
        result_folder="results",
        expected_filename=os.path.join(os.path.dirname(__file__), "my-check.expected.png"),
    )

Contributing

Install the pre-commit hooks:

pip install pre-commit
pre-commit install --allow-missing-config