census-instrumentation / opencensus-python

A stats collection and distributed tracing framework
Apache License 2.0
668 stars 250 forks source link

Duplicate entries in app insights #1214

Open DSilvadev opened 1 year ago

DSilvadev commented 1 year ago

Hi everyone, I'm getting duplicate logs with the exact timestamp in app insights, as the screenshot below shows:

image

The code I'm using:

import logging
from opencensus.ext.azure.log_exporter import AzureLogHandler
import sys

_logger = None

_logger = logging.getLogger('test')

if not _logger.handlers:

    formatter = logging.Formatter('%(message)s')

    azure_handler = AzureLogHandler(
        connection_string="<my_key>")

    azure_handler.setFormatter(formatter)

    _logger.addHandler(azure_handler)

    std_out_handler = logging.StreamHandler(sys.stdout)
    std_out_handler.setLevel(logging.DEBUG)

    _logger.addHandler(std_out_handler)

_logger.setLevel(logging.INFO)

_logger.propagate = False
_logger.root.propagate = False

properties = {'custom_dimensions': {'group': 'app_info'}}
_logger.info('test message', extra=properties)

I only get one message with the handler std_out_handler and when I sent message to the _logger the message is duplicated.

Any ideas how to solve / reason for this ?

Python version: 3.10 Opencensus version: 1.1.9

Thanks !

lzchen commented 1 year ago

Are you perhaps using a databricks notebook to run this? This error usually occurs when you are adding multiple instances of the azure log handler to your logger.

mosopeogundipe commented 11 months ago

I also have the exact same issue as @DSilvadev , see image of duplicate entries below which have the same timestamp. This error happens whether I use opencensus library in locally or databricks environment. And I am adding the azure log handler to my logger just once.

Screenshot 2023-10-31 at 3 40 55 PM

lzchen commented 11 months ago

@mosopeogundipe

Could you paste a code snippet of how you are using the azure log handler?

juniinacio commented 11 months ago

Hi, I also experience the same issue when exporting logs to Azure in a Python Azure Function, also duplicate messages are being logged in app insights:

import os
import json
import logging
from typing import List

import azure.functions as func

from opencensus.ext.azure.log_exporter import AzureLogHandler

from opencensus.ext.azure.trace_exporter import AzureExporter
from opencensus.trace import config_integration
from opencensus.trace.samplers import ProbabilitySampler
from opencensus.trace.tracer import Tracer
from opencensus.trace import execution_context
from opencensus.trace.propagation.trace_context_http_header_format import TraceContextPropagator

config_integration.trace_integrations(['requests'])
config_integration.trace_integrations(["logging"])

def callback_add_role_name(envelope):
    envelope.tags["ai.cloud.role"] = os.environ.get("WEBSITE_SITE_NAME")
    return True

exporter = AzureExporter()
exporter.add_telemetry_processor(callback_add_role_name)

handler = AzureLogHandler()
handler.add_telemetry_processor(callback_add_role_name)

logger = logging.getLogger(__name__)

logger.addHandler(handler)

from . additional_functions import *

def main(events: List[func.EventHubEvent], PSBStore: func.Out[bytes], ADXStore: func.Out[str], context: func.Context) -> None:

    args = get_args()

    span_context = TraceContextPropagator().from_headers({"traceparent": context.trace_context.Traceparent, "tracestate": context.trace_context.Tracestate})
    tracer = Tracer(span_context=span_context, exporter=exporter, sampler=ProbabilitySampler(1.0))
    execution_context.set_opencensus_tracer(tracer)

    predictions = []
    for trajectory in args:
        logger.info(f"this message will be duplicated in trace logs {trajectory}...")
        prediction = inference(events, trajectory, args[trajectory], logger, tracer)

        extra = {'custom_dimensions': prediction}
        logger.info(
            f"this message will also be duplicated in trace logs {trajectory}: {json.dumps(prediction)}",
            extra=extra
        )

        predictions.append(prediction)

Also operation id is not being set on the trace message with custom dimensions:

image

Message without custom dimension:

image

I'm new at writing python code so my apologizes with that.

lzchen commented 11 months ago

Try adding:

root_logger = logging.getLogger()
for handler in root_logger.handlers[:]:
    root_logger.removeHandler(handler)

before

logger = logging.getLogger(__name__)

in your code to remove duplicate logs.

juniinacio commented 11 months ago

@lzchen will try it and let you know the results, thanks.

juniinacio commented 11 months ago

Just did some testing, and I can confirm that removing the handlers from the root logger corrected the issue with the duplicate messages for me, also correlation id etc.. works also. Only one downside I have noticed is that in this way, the custom dimensions are overridden, meaning you will lose the default custom properties set by Azure Functions as soon as you specify the custom dimensions. Once again thanks for your quick reply, really helpful.

mosopeogundipe commented 11 months ago

@mosopeogundipe

Could you paste a code snippet of how you are using the azure log handler?

@lzchen here's my code snippet below. The first code block contains classes I defined which I call in the 2nd code block below, so that it can send logs from my entire app to app insights.

Logger classes:

import logging
import os
import uuid
from logging import LogRecord
from typing import Any, Dict

from opencensus.ext.azure.common.protocol import Envelope
from opencensus.ext.azure.log_exporter import AzureLogHandler

class AzureAppInsightsLogger:
    logger = None

    def __init__(
        self,
        app_name: str,
        operation_group_id: str = None,
        additional_base_properties: Dict[str, Any] = None,
        log_level: int = logging.INFO,
        logger: logging.Logger = None,
        instrumentation_key: str = None,
        **options: Dict[str, Any],
    ):
        self.add_handler(app_name, operation_group_id, additional_base_properties, log_level, logger, instrumentation_key, **options)

    @classmethod
    def get_logger(cls) -> logging.Logger:
        """
        :type name: str
        :param name: name of the logger
        :return: a logger with the given name
        """
        return cls.logger

    @classmethod
    def add_handler(
        cls,
        app_name: str,
        operation_group_id: str = None,
        additional_base_properties: Dict[str, Any] = None,
        log_level: int = logging.INFO,
        logger: logging.Logger = None,
        instrumentation_key: str = None,
        **options: Dict[str, Any],
    ):
        appinsight_handler = _AzureLogHandlerWithCtx(
            app_name=app_name,
            operation_group_id=operation_group_id,
            additional_base_properties=additional_base_properties,
            instrumentation_key=instrumentation_key,
            **options,
        )
        appinsight_handler.setLevel(log_level)
        cls.logger = logger if logger else logging.getLogger('')
        cls.logger.setLevel(log_level)
        if _AzureLogHandlerWithCtx.is_live_mode():
            cls.logger.addHandler(appinsight_handler)

class _AzureLogHandlerWithCtx(AzureLogHandler):
    def __init__(
        self,
        app_name: str,
        operation_group_id: str = None,
        additional_base_properties: Dict[str, Any] = None,
        instrumentation_key: str = None,
        **options: Dict[str, Any],
    ):
        self.app_name = app_name
        self.operation_group_id = operation_group_id or uuid.uuid4().urn
        self.additional_base_properties = additional_base_properties or {}
        self._options = options or {}
        self.add_instrumentation_key(self._options, instrumentation_key)
        super().__init__(**self._options)

    @staticmethod
    def add_instrumentation_key(
        options: Dict[str, Any],
        instrumentation_key: str = None,
    ):
        if 'connection_string' not in options or options['connection_string'] is None:
            instr_key = instrumentation_key or os.getenv(LOGGER_INSTRUMENTATION_KEY)
            options['connection_string'] = f'InstrumentationKey={instr_key}'

    @staticmethod
    def is_live_mode():
        return os.getenv(LOGGER_LIVE_MODE, 'true') == 'true'

    def log_record_to_envelope(self, record: LogRecord) -> Envelope:
        """
        :type record: logging.LogRecord
        :param record: LogRecord object required by the base class 'AzureLogHandler'
        :return: opencensus.ext.azure.common.protocol.Envelope
        """
        envelope = super().log_record_to_envelope(record)

        # set this filterable value
        envelope.tags['ai.cloud.role'] = self.app_name
        if not envelope.data.baseData.properties:
            envelope.data.baseData.properties = {}
        # Fill custom properties
        for k, v in self.additional_base_properties.items():
            envelope.data.baseData.properties[k] = v

        # This lets us combine all the logs of a particular run
        if self.operation_group_id:
            envelope.tags['ai.operation.id'] = self.operation_group_id

        return envelope

Adding AzureAppInsightsLogger to root handler is done here in my main.py. This enables logs from my app to send to app insights:

AzureAppInsightsLogger.add_handler(
    app_name='my-app',
    logger=logging.getLogger(''),
    instrumentation_key='xxxxxx',
    log_level=logging.DEBUG,
)