getsentry / raven-python

Raven is the legacy Python client for Sentry (getsentry.com) — replaced by sentry-python
https://sentry.io
BSD 3-Clause "New" or "Revised" License
1.68k stars 657 forks source link

Rampant multiplication of events created from logging.error log messages; flooding stdout #1262

Open butla opened 6 years ago

butla commented 6 years ago

I was trying to use Raven in one of our AWS Lambdas, but 52 logged errors (all instances of the same error that we see in a couple percent of records at the moment) were turned into 991 events on the Sentry side, and countless RateLimited API errors.

My Raven configuration I want to have unhandled exceptions and error log messages sent to Sentry.


def lambda_main(event, context):
    config = configuration.get_config_from_env_vars()
    sentry_client = LambdaClient(
        config.sentry_url,
        environment=config.environment,
        site=config.app_name,
        release=config.version,
    )
    setup_logging(sentry_client)

    @sentry_client.capture_exceptions
    def _inner_main(event, context):
        # ... the code ...
    _inner_main(event, context)

def setup_logging(sentry_client):
    logging.root.setLevel(logging.INFO)
    logging.getLogger('botocore').setLevel(logging.WARNING)
    _attach_sentry_to_logs(sentry_client)

def _attach_sentry_to_logs(sentry_client):
    sentry_handler = raven.handlers.logging.SentryHandler(sentry_client)
    sentry_handler.setLevel(logging.WARNING)
    root_logger = logging.getLogger()
    root_logger.addHandler(sentry_handler)

How the incident went down After I deployed the version with Sentry integration, the first couple of Lambda (3, for one of the Lambda workers) runs that did encounter and log the error finished without any problems. Then all hell broke loose - there were these log messages:

[ERROR]        2018-06-14T17:07:53.272Z        8c9909b4-522c-44ce-9bc6-8f4335a841a3    Sentry responded with an API error: RateLimited(Creation of this event was denied due to rate limiting)

followed by about 20 instances of the same message dumped to stdout and about as many dumps of my original error message (you know, surrounded by b'[]'). It seemed to grow in mangnitude on subsequent runs. Lambdas practically stopped processing any Kinesis records after about 10 minutes. I had to roll back the Sentry integration.

Also, the ammount of output generated by Raven here was huge. Our average CloudWatch logs from a 40 minute period are 6MBs. Then it was 600.

My guess Maybe that's connected to #377? Could there be some recursion with creating and sending the messages based on logs?