apache / superset

Apache Superset is a Data Visualization and Data Exploration Platform
https://superset.apache.org/
Apache License 2.0
61.77k stars 13.51k forks source link

logger `superset` in production mode writing logs to stderr confuses logging service in Google Cloud #30135

Open tigerinus opened 1 week ago

tigerinus commented 1 week ago

Bug description

https://github.com/apache/superset/blob/4.0.2/superset/utils/logging_configurator.py#L50

This line overrides any logging.basicConfig(...) and reset the stream to stderr.

This is causing logging service in cloud platforms such as Google Cloud, specifically GKE, to falsely identify its output as errors, causing confusions to operation team.

How to reproduce the bug

Just run superset and look for anything written to stderr - there are messages that are not actually errors.

Screenshots/recordings

No response

Superset version

4.0.2

Python version

3.11

Node version

Not applicable

Browser

Not applicable

Additional context

I need to apply following workaround to prevent it sending logs to stderr - but for people who don't know, each of them will be confused before finding this issue report:

import logging
import sys

from flask import Config
from superset.utils.logging_configurator import DefaultLoggingConfigurator

SECRET_KEY = "thisismysecretkey"

class StdoutLoggingConfigurator(DefaultLoggingConfigurator):
    def configure_logging(self, app_config: Config, debug_mode: bool) -> None:
        super().configure_logging(app_config, debug_mode)
        superset_logger = logging.getLogger("superset")
        for handler in superset_logger.handlers:
            superset_logger.removeHandler(handler)

LOGGING_CONFIGURATOR = StdoutLoggingConfigurator()

logging.basicConfig(
    stream=sys.stdout,
    format="whatever - %(asctime)s:%(levelname)s:%(name)s:%(message)s",
    force=True,
)

Checklist

dosubot[bot] commented 1 week ago

I found a similar unsolved discussion that might be relevant to your issue: Additional StreamHandler in the DefaultLoggingConfigurator causes duplicate logs [1].

To address the issue of Apache Superset version 4.0.2 writing logs to stderr in production mode, you can modify the logging configuration to redirect logs to stdout. Here is an example of how you can achieve this by subclassing DefaultLoggingConfigurator:

import logging
import sys
from flask import Config
from superset.utils.logging_configurator import DefaultLoggingConfigurator

class StdoutLoggingConfigurator(DefaultLoggingConfigurator):
    def configure_logging(self, app_config: Config, debug_mode: bool) -> None:
        # Call the parent method to set up the initial configuration
        super().configure_logging(app_config, debug_mode)

        # Get the 'superset' logger
        superset_logger = logging.getLogger("superset")

        # Remove all existing handlers
        for handler in superset_logger.handlers:
            superset_logger.removeHandler(handler)

        # Add a new handler that writes to stdout
        handler = logging.StreamHandler(sys.stdout)
        handler.setFormatter(logging.Formatter(app_config["LOG_FORMAT"]))
        superset_logger.addHandler(handler)

# Set the custom logging configurator
LOGGING_CONFIGURATOR = StdoutLoggingConfigurator()

# Apply the basic configuration to redirect logs to stdout
logging.basicConfig(
    stream=sys.stdout,
    format="%(asctime)s:%(levelname)s:%(name)s:%(message)s",
    force=True,
)

This code will ensure that logs are written to stdout instead of stderr, preventing the misclassification of log messages in cloud environments like Google Kubernetes Engine (GKE) [2].

To continue talking to Dosu, mention @dosu.

rusackas commented 1 week ago

Does this also affect AWS? CC @eschutho @dpgaspar

Dosu sounds like it's onto something, but I'm not sure the pitfalls of just moving to stdout everywhere.

eschutho commented 1 week ago

I think this may be fixed in master. See https://github.com/apache/superset/pull/27777 Looks like it's going into the 4.1.0 release.