census-instrumentation / opencensus-python

A stats collection and distributed tracing framework
Apache License 2.0
666 stars 248 forks source link

Tons of log traces with Data drop 206: Telemetry sampled out. #1220

Closed Marvedog closed 10 months ago

Marvedog commented 10 months ago

Describe your environment. Describe any aspect of your environment relevant to the problem, including your Python version, platform, version numbers of installed dependencies, information about your cloud hosting provider, etc. If you're reporting a problem with a specific version of a library in this repo, please check whether the problem has been fixed on master.

Steps to reproduce. I am a bit unsure exactly how to reproduce this. Here is our logging config:

"""
Tracing config
"""
OPENCENSUS = {
    "TRACE": {
        "EXCLUDELIST_PATHS": ["robots.txt"],
        "SAMPLER": "opencensus.trace.samplers.ProbabilitySampler(rate=1.0)",
    }
}

# Enable requests tracing
config_integration.trace_integrations(["requests"])

LOGGING["handlers"].update(
    {
        "azure": {
            "level": "INFO",  # Send logs to Azure starting from INFO level
            "class": "opencensus.ext.azure.log_exporter.AzureLogHandler",
            "connection_string": APP_INSIGHTS_CONNECTION_STRING,
            "timeout": 30,
            "formatter": "default",
        }
    }
)
LOGGING["loggers"][""] = {
    "handlers": ["azure", "console"],
    "level": "INFO",
    "filters": ["require_debug_false"],
}

OPENCENSUS["TRACE"][
    "EXPORTER"
] = f"""opencensus.ext.azure.trace_exporter.AzureExporter(
            connection_string='{APP_INSIGHTS_CONNECTION_STRING}',
            timeout=30,
        )"""

What is the expected behavior? We are seeing a lot of ERRORS with Data drop: 206 Telemetry Sampled out, and it is unclear what the issue actually is. It seems that traces are added to log each response from Azure Application Insights that return 206. Is this actually the expected behaviour? The result from our end is that with each request log, we also store several traces to document that we obtained a HTTP 206 from Azure Application Insights. This seems to be quite excessive from our end.

What is the actual behavior? We would expect to be able to limit these traces, but we are unsure of their purpose and why they exist at all. We are therefore unsure about how to proceed.

Additional context. Add any other context about the problem here.

lzchen commented 10 months ago

@Marvedog

The opencensus azure monitor log exporter will collect logging calls made using the python logger depending on which logger you attach it to. The reason you are seeing trace telemetry for error messages generated from the exporter (which ideally you should not be) is because your log exporter (AzureLogHandler) is being attached to the root logger. You should probably use a named logger within your application code and then attach the AzureLogHandler in your django settings to ONLY track your APPLICATION level logs and not logs generated from the exporter itself.

A side note, the azure monitor exporters sends telemetry to the backend in batches. The 206 response code is returned if some of the data fails to send. The Telemetry Sampled out error message might mean you have hit some sort of quota or threshold. Please check your Application Insights resource for your daily volume cap or if you have enabled sampling of some sort.

Marvedog commented 10 months ago

@lzchen

Thank you for the response!

For any other observing this problem. The HTTP 206 response occurred we enabled ingestion sampling. By setting the data sampling rate to 100% the error message no longer appears.