census-instrumentation / opencensus-python

A stats collection and distributed tracing framework
Apache License 2.0
669 stars 250 forks source link

issue in httplib and requests #627

Open reyang opened 5 years ago

reyang commented 5 years ago

The following code would generate 5 spans, which seems to be confusing:

import requests
from opencensus.trace import config_integration
from opencensus.trace.tracer import Tracer
config_integration.trace_integrations(['httplib', 'requests'])
tracer = Tracer()
with tracer.span(name='parent'):
    response = requests.get('https://www.wikipedia.org/wiki/Rabbit')

Expected:

  1. Probably we only want 4 spans, need to dig out more:
    parent span
    requests span
        httplib span, which returned HTTP 301 (redirection)
        httplib span, which returned HTTP 200
  2. The httplib span attribute http.url should be full URL, current it is /wiki/Rabbit.
[SpanData(name='[httplib]request', context=SpanContext(trace_id=56b691b1c448a80bfd21455d1e3916ac, span_id=21a6c4871e00d227, trace_options=TraceOptions(enabled=True), tracestate=None), span_id='b0cd26aa762e72c5', parent_span_id='21a6c4871e00d227', attributes={'http.url': '/wiki/Rabbit', 'http.method': 'GET', 'http.status_code': '301'}, start_time='2019-04-23T00:46:40.566298Z', end_time='2019-04-23T00:46:40.587506Z', child_span_count=0, stack_trace=None, time_events=[], links=[], status=None, same_process_as_parent_span=None, span_kind=2)]
[SpanData(name='[httplib]request', context=SpanContext(trace_id=56b691b1c448a80bfd21455d1e3916ac, span_id=21a6c4871e00d227, trace_options=TraceOptions(enabled=True), tracestate=None), span_id='9099cb62a56c5910', parent_span_id='21a6c4871e00d227', attributes={'http.url': '/wiki/Rabbit', 'http.method': 'GET', 'http.status_code': '200'}, start_time='2019-04-23T00:46:40.703073Z', end_time='2019-04-23T00:46:40.718689Z', child_span_count=0, stack_trace=None, time_events=[], links=[], status=None, same_process_as_parent_span=None, span_kind=2)]
[SpanData(name='[requests]get', context=SpanContext(trace_id=56b691b1c448a80bfd21455d1e3916ac, span_id=f240885e8fcd8866, trace_options=TraceOptions(enabled=True), tracestate=None), span_id='21a6c4871e00d227', parent_span_id='f240885e8fcd8866', attributes={'http.url': 'https://www.wikipedia.org/wiki/Rabbit', 'http.status_code': '200'}, start_time='2019-04-23T00:46:40.433738Z', end_time='2019-04-23T00:46:40.770006Z', child_span_count=0, stack_trace=None, time_events=[], links=[], status=None, same_process_as_parent_span=None, span_kind=2)]
[SpanData(name='[requests]get', context=SpanContext(trace_id=56b691b1c448a80bfd21455d1e3916ac, span_id=f9b3b7a04c993de9, trace_options=TraceOptions(enabled=True), tracestate=None), span_id='f240885e8fcd8866', parent_span_id='f9b3b7a04c993de9', attributes={'http.url': 'https://www.wikipedia.org/wiki/Rabbit', 'http.status_code': '200'}, start_time='2019-04-23T00:46:40.433738Z', end_time='2019-04-23T00:46:40.776972Z', child_span_count=0, stack_trace=None, time_events=[], links=[], status=None, same_process_as_parent_span=None, span_kind=2)]
[SpanData(name='parent', context=SpanContext(trace_id=56b691b1c448a80bfd21455d1e3916ac, span_id=None, trace_options=TraceOptions(enabled=True), tracestate=None), span_id='f9b3b7a04c993de9', parent_span_id=None, attributes={}, start_time='2019-04-23T00:46:40.433738Z', end_time='2019-04-23T00:46:40.778972Z', child_span_count=0, stack_trace=None, time_events=[], links=[], status=None, same_process_as_parent_span=None, span_kind=0)]
c24t commented 5 years ago

For (1), the problem is that the requests integration is creating two nearly identical spans, one the child of the other?

reyang commented 5 years ago

Yes. Two spans from requests, and they are parent-child relation, and end_time are different.

victoraugustolls commented 5 years ago

Any news here? Would like to collaborate if possible

victoraugustolls commented 5 years ago

It seems to me, by looking at the requests package code, that every time a user calls requests.{method}, internally the library creates a sessions, triggering the session wrapper. Wouldn't it be the case to only wraps session calls, as internally they are always made?

I might be missing something though.

reyang commented 5 years ago

@lzchen FYI.

@victoraugustolls Thanks for bringing it up. We haven't worked on this issue.

Here goes my current thinking:

  1. We need to track the actual underlying HTTP operation (which has a strict 1:1 mapping for the actual HTTP outbound request).
  2. It would be helpful to track the high level HTTP operation, which is the logical layer thing. For example, if someone is trying to HTTP GET http://non.existing.domain/, we expect the logical operation to fail, with no actual physical HTTP call (since DNS resolution failure will trigger exception on the HTTP client).
  3. For metrics, we want to use 1) for the actual HTTP requests/second and latency (as counting invalid DNS or URL for average latency does not make sense). We can use 2) for logical operations, similar like how people would use metrics for local spans.

We can discuss more at https://github.com/open-telemetry/opentelemetry-specification/issues/110.

With this, I would expect the result to be:

parent span
    requests span
        httplib span, which returned HTTP 301 (redirection)
        httplib span, which returned HTTP 200

metrics:
    number of HTTP request: 2
    number of spans from requests library: 1
victoraugustolls commented 5 years ago

@reyang thanks for the clarification!

Sorry, but I still don't understand why we need to create a span when we use requests.{method}. If we just trace the session requests, wouldn't we have the expected results? All the validation and redirect logic is inside the session request, and not in the .{method}.

victoraugustolls commented 5 years ago

Just for clarification, you're saying that if I change this like so:

def trace_integration(tracer=None):
    """Wrap the requests library to trace it."""
    log.info('Integrated module: {}'.format(MODULE_NAME))

    if tracer is not None:
        # The execution_context tracer should never be None - if it has not
        # been set it returns a no-op tracer. Most code in this library does
        # not handle None being used in the execution context.
        execution_context.set_opencensus_tracer(tracer)

    # Wrap the requests functions
    for func in REQUESTS_WRAP_METHODS:
        requests_func = getattr(requests, func)
        wrapped = wrap_requests(requests_func)
        setattr(requests, requests_func.__name__, wrapped)

    # Wrap Session class
    wrapt.wrap_function_wrapper(
        MODULE_NAME, 'Session.request', wrap_session_request)

Maintain:

 # Wrap Session class
    wrapt.wrap_function_wrapper(
        MODULE_NAME, 'Session.request', wrap_session_request)

Remove:

 # Wrap the requests functions
    for func in REQUESTS_WRAP_METHODS:
        requests_func = getattr(requests, func)
        wrapped = wrap_requests(requests_func)
        setattr(requests, requests_func.__name__, wrapped)

I would have:

parent span
    httplib span, which returned HTTP 301 (redirection)
    httplib span, which returned HTTP 200

metrics:
    number of HTTP request: 2
    number of spans from requests library: 1

Is that the case? It looks for me that removing the specified code would give your expected result.

It looks for me that we now have this:

parent span
    requests span (methods.get)
        requests span (Session)
            httplib span, which returned HTTP 301 (redirection)
            httplib span, which returned HTTP 200

metrics:
    number of HTTP request: 2
    number of spans from requests library: 2

And if we removed the span creation from inside requests.get, we would have this:

parent span
    requests span (Session)
        httplib span, which returned HTTP 301 (redirection)
        httplib span, which returned HTTP 200

metrics:
    number of HTTP request: 2
    number of spans from requests library: 1
reyang commented 5 years ago

Yes, I think this is what we want. Let me know if it makes sense to you.

jeffddix commented 4 years ago

This issue impact the Azure Application Insights Application Map. I am trying to model two Python Flask services communicating with each other using the Requests library. The two spans show up as two dependencies in Azure Application Insights, with the second dependent on the first. Both dependencies show correct service as the "target". Azure doesn't know how to represent two dependencies going to the same target, but are linked together (the second span is a child of the first), so I creates a new component on the map instead of linking it to the other microservice. If I implement the same logic that this requests API is doing, and use a single span, the Application Map is correctly drawn.

Here is what the map looks like with using the requests library. Flask service "app.py" makes a request to "dependent_service.py", but the line for the HTTP request is drawn to a new component. map-before

When I implement the same requests logic using a single span, here is what it looks like. Note I changed the name of the services. map-after

NoutieH commented 4 years ago

I think @victoraugustolls is right. for func in REQUESTS_WRAP_METHODS: requests_func = getattr(requests, func) wrapped = wrap_requests(requests_func) setattr(requests, requests_func.__name__, wrapped) is redundant since the specific requests methods all rely on the generic Session.request method, which is also wrapped.

Should I make a PR for removing the requests functions?

lzchen commented 4 years ago

@NoutieH If you would like to contribute this part, go for it! I can assign the task to you.