Closed RobertoPrevato closed 8 months ago
Note: I created the issue in this repository because the template of this repository https://github.com/microsoft/ApplicationInsights-Python says to create issues here, with tag monitor - exporter
.
Thanks for the feedback, we’ll investigate asap.
@RobertoPrevato
Could you set your environment variable OTEL_EXPERIMENTAL_RESOURCE_DETECTORS
to azure_app_service
and see if that helps with the startup time?
Hi @RobertoPrevato. Thank you for opening this issue and giving us the opportunity to assist. To help our team better understand your issue and the details of your scenario please provide a response to the question asked above or the information requested above. This will help us more accurately address your issue.
Hi @lzchen thank you for following this issue.
I tried what you recommended three times, and the situation improves very much - although there is still a ~5 seconds delay. By the way since I see web requests are happening, I confirm that I have a good internet connection.
Running in Bash with: OTEL_EXPERIMENTAL_RESOURCE_DETECTORS="azure_app_service" python -m cProfile main.py
(I paste only the first lines)
Hello, World!
1122587 function calls (1073027 primitive calls) in 4.908 seconds
Ordered by: cumulative time
ncalls tottime percall cumtime percall filename:lineno(function)
696/1 0.017 0.000 4.910 4.910 {built-in method builtins.exec}
1 0.000 0.000 4.910 4.910 main.py:1(<module>)
1 0.000 0.000 4.416 4.416 _configure.py:63(configure_azure_monitor)
1 0.000 0.000 4.065 4.065 _configure.py:113(_setup_tracing)
4/3 0.000 0.000 4.046 1.349 _base.py:74(__init__)
3 0.000 0.000 4.041 1.347 _statsbeat.py:41(collect_statsbeat_metrics)
1 0.000 0.000 4.023 4.023 __init__.py:384(force_flush)
1 0.000 0.000 4.023 4.023 __init__.py:551(force_flush)
1 0.000 0.000 4.023 4.023 __init__.py:357(force_flush)
1 0.000 0.000 4.023 4.023 __init__.py:306(collect)
2 0.000 0.000 3.995 1.997 sessions.py:502(request)
2 0.000 0.000 3.990 1.995 sessions.py:673(send)
2 0.000 0.000 3.989 1.995 adapters.py:434(send)
2 0.000 0.000 3.988 1.994 connectionpool.py:595(urlopen)
2 0.000 0.000 3.987 1.993 connectionpool.py:380(_make_request)
2 0.000 0.000 3.278 1.639 connection.py:197(_new_conn)
2 0.000 0.000 3.275 1.637 connection.py:27(create_connection)
2 3.260 1.630 3.260 1.630 {method 'connect' of '_socket.socket' objects}
1 0.000 0.000 3.072 3.072 measurement_consumer.py:93(collect)
3 0.000 0.000 3.071 1.024 instrument.py:127(callback)
1 0.000 0.000 3.071 3.071 _statsbeat_metrics.py:144(_get_attach_metric)
1 0.000 0.000 3.071 3.071 _statsbeat_metrics.py:183(_get_azure_compute_metadata)
1 0.000 0.000 3.071 3.071 api.py:62(get)
1 0.000 0.000 3.071 3.071 api.py:14(request)
2 0.000 0.000 3.067 1.534 connection.py:319(request)
2 0.000 0.000 3.067 1.533 client.py:1266(endheaders)
3 0.000 0.000 3.067 1.022 client.py:967(send)
2 0.000 0.000 3.067 1.533 client.py:1028(_send_output)
1 0.000 0.000 3.067 3.067 connection.py:242(connect)
110 0.003 0.000 2.162 0.020 __init__.py:1(<module>)
1 0.000 0.000 0.951 0.951 __init__.py:516(_receive_metrics)
1 0.000 0.000 0.937 0.937 _exporter.py:70(export)
1 0.000 0.000 0.937 0.937 _base.py:168(_transmit)
1 0.000 0.000 0.937 0.937 _azure_monitor_client_operations.py:57(track)
1 0.000 0.000 0.925 0.925 _base.py:218(run)
8/1 0.000 0.000 0.925 0.925 _base.py:76(send)
1 0.000 0.000 0.925 0.925 _redirect.py:183(send)
1 0.000 0.000 0.925 0.925 _retry.py:502(send)
1 0.000 0.000 0.924 0.924 _base.py:108(send)
1 0.000 0.000 0.924 0.924 _requests_basic.py:316(send)
1 0.000 0.000 0.692 0.692 connectionpool.py:1088(_validate_conn)
1 0.000 0.000 0.692 0.692 connection.py:609(connect)
641/16 0.004 0.000 0.503 0.031 <frozen importlib._bootstrap>:1167(_find_and_load)
633/9 0.003 0.000 0.503 0.056 <frozen importlib._bootstrap>:1122(_find_and_load_unlocked)
594/14 0.003 0.000 0.500 0.036 <frozen importlib._bootstrap>:666(_load_unlocked)
1333/20 0.001 0.000 0.500 0.025 <frozen importlib._bootstrap>:233(_call_with_frames_removed)
541/9 0.001 0.000 0.500 0.056 <frozen importlib._bootstrap_external>:934(exec_module)
1 0.000 0.000 0.492 0.492 _configure.py:1(<module>)
1 0.000 0.000 0.481 0.481 connection.py:708(_ssl_wrap_socket_and_match_hostname)
Can you please try, too, to confirm this issue with the startup time?
@RobertoPrevato
Couple of things to note here. The distro package that you are using azure-monitor-opentelemetry
is a one-stop-shop telemetry solution that will provide automatic telemetry collection for logs, distributed tracing and metrics to AppInsights. It is meant to be a comprehensive, not as configurable and "specific set of features" SDK that users can use to achieve the most common telemetry needs, and may run components that you may not need or care for in your specific use case, which in turn may cause the increased startup time to be excessive. If you desire a more piece-wise approach in which you can configure what components you want to use, I recommend instrumenting with the azure-monitor-opentelemetry-exporter which the distro is built on top of and allows you more fine-grained control over components.
On a side note, the components that both SDKs are built on top of are designed to be non-blocking. The telemetry exporting occurs on a separate thread from the main thread, so it is interesting that the force_flush
calls are stopping the application from starting up. Are you able to run the same diagnostics on a standlone Python application, not hosted in App Service and see if the behavior is the same?
Hi @RobertoPrevato. Thank you for opening this issue and giving us the opportunity to assist. To help our team better understand your issue and the details of your scenario please provide a response to the question asked above or the information requested above. This will help us more accurately address your issue.
@lzchen Thank You for the additional context. I am running that code from my local development machine, Ubuntu 22.04. If you could try running the same example from this learn.microsoft.com page, you should get the same results.
I tried the second way you recommended, and tried running the Export Hello World Log here. Unfortunately I get the same result: 4.6 seconds overhead at start-up.
I had to modify slightly the imports of the example because of this, so finally it looks like this:
"""
An example to show an application using Opentelemetry logging sdk. Logging calls to the
standard Python logging library are tracked and telemetry is exported to application
insights with the AzureMonitorLogExporter.
"""
import os
import logging
from opentelemetry.sdk._logs import (
LoggerProvider,
LoggingHandler,
)
from opentelemetry._logs import set_logger_provider
from opentelemetry.sdk._logs.export import BatchLogRecordProcessor
from azure.monitor.opentelemetry.exporter import AzureMonitorLogExporter
logger_provider = LoggerProvider()
set_logger_provider(logger_provider)
exporter = AzureMonitorLogExporter(
connection_string=os.environ["APPLICATIONINSIGHTS_CONNECTION_STRING"]
)
logger_provider.add_log_record_processor(BatchLogRecordProcessor(exporter))
handler = LoggingHandler()
# Attach LoggingHandler to root logger
logging.getLogger().addHandler(handler)
logging.getLogger().setLevel(logging.NOTSET)
logger = logging.getLogger(__name__)
logger.warning("Hello World!")
# Telemetry records are flushed automatically upon application exit
# If you would like to flush records manually yourself, you can call force_flush()
logger_provider.force_flush()
I will try to investigate a bit more.
I verified this (also) depends on the Azure datacenter location. I am located close to Warsaw Poland. For some reason, I get ~5 seconds delay when I use a connection string for an Azure Application Insights instance provisioned in the new datacenter in Poland. The delay is ~3.5 seconds when I use an Azure Application Insights provisioned in the datacenter in West Europe (in the Netherlands).
I also verified that the same happens when I use the older library. I erroneously thought the bigger delay was caused by using a different Python library, instead it was related to using a different Azure datacenter (paradoxically, geographically much closer to me).
I still think something is wrong in the design of the library, though, as I initially got a 16 seconds delay with default settings and without using experimental features, while following instructions in learn.microsoft.com.
@lzchen I am facing the same issue of very slow initialization.
I have tried both the "one-stop-shop" solution in azure-monitor-opentelemetry
and the lower level azure-monitor-opentelemetry-exporter
.
In both cases, the delay is unrelated to the force_flush, and occurs at the exporter initialization, ie:
azure.monitor.opentelemetry.configure_azure_monitor(connection_string="...")
for azure-monitor-opentelemetry
azure.monitor.opentelemetry.exporter.AzureMonitorLogExporter(connection_string='...')
for azure-monitor-opentelemetry-exporter
For reference, the timings I'm getting (from Greece to West Europe region) are:
azure-monitor-opentelemetry
without OTEL_EXPERIMENTAL_RESOURCE_DETECTORS
azure-monitor-opentelemetry
with OTEL_EXPERIMENTAL_RESOURCE_DETECTORS=azure_app_service
azure-monitor-opentelemetry-exporter
Are these delayes normal and expected?
Describe the bug The
configure_azure_monitor
function to configure OpenTelemetry with an instance of Azure Application Insights is surprisingly slow. I am trying to upgrade my code that still uses the old OpenCensus libraries: the old OpenCensus libraries for Python and Azure App Insights still work perfectly fine, but the new one for OpenTelemetry adds a ~16 seconds delay to application startup.To Reproduce Steps to reproduce the behavior:
learn.microsoft.com
for using OpenTelemetry with Python, with a valid connection string for an Application Insights resourcepython -m cProfile main.py
Expected behavior Configuring OpenTelemetry logging should not add several seconds delay to startup, it should work fast as the older OpenCensus library (
opencensus-ext-azure==1.1.12
).Additional context These are the first lines of
cProfile
output:Calls to
lock.acquire
alone make up for 11 seconds delay, with 149 calls149 11.102 0.075 11.102 0.075 {method 'acquire' of '_thread.lock' objects}