Best way to handle concurrent sibling client spans?

dbishop commented 4 years ago

Hi! I will explain the general situation, how we want the Trace's data in Zipkin to look, then detail the horrible hacks we had to do in order to achieve what we wanted. Finally, I'll ask for advice; is there a better technique to get what we want? Should we want something different?

When instrumenting an eventlet WSGI application (OpenStack Swift), I ran across a funny problem. (I'll be only addressing Python 2.7 here, but Swift and any tracing solution also need to support Py3)

We have a proxy-server that fires off multiple concurrent client requests to backend servers. At a low enough level each one is using eventlet-greened stdlib httplib.HTTPConnection, httplib.HTTPResponse, etc. To catch the beginning of outbound HTTP requests, we monkey-patch httplib.HTTPConnection.endheaders() like so:

span_ctx = zipkin_client_span(...)
span_ctx.start()
span_ctx.add_remote_endpoint(...)
b3_headers = create_http_headers_for_this_span()
[inject the headers]
[call the real endheaders() method]

We catch the point where the remote service has delivered response headers by monkey-patching httplib.HTTPResponse.begin() like so:

[call the real begin() method]
span_ctx = ...
span_ctx.update_binary_annotations({"http.status_code": ...})
span_cx.add_annotation('Response headers received')
[handle edge case where HEAD responses need .close() called here or it will never get called]

Finally, the client span for each request is closed via a monkey-patched httplib.HTTPResponse.close that looks like this:

[call the real close()]
span_ctx = ...
span_ctx.stop()

As an example, here's what a trace that I like looks like in the Zipkin Lens UI for the GET of an erasure-coded object that requires data from 4 different backend storage servers:

Here's detail of the first subrequest (SpanID=6874) with all the Annotations as I expect them. A wrapping Client Start/Client Finish with local_endpoint saying swift-proxy-server, a Server Start/Server Finish with local_endpoint saying swift-object-server. And a "Response headers received" annotation logged on the swift-proxy-server. Tags (not shown) for the whole span (SpanID=6874) include Server Address = IP:PORT (swift-object-server), a client.pid = [proxy-server PID], a worker.pid = [swift-object-server PID], and Client Address = IP:PORT (swift-proxy-server):

Finally, here's the JSON for the whole erasure-coded object GET trace to just cover all the details for which screenshots would be too unwieldy: 5e516c09f4985b19706ef268aee15ed1.json.txt

Okay, so that's how I want these subrequests to get tracked and look like. Thanks for reading this far! For reference, below, the api module you will see is from a swift_zipkin package, and contains an opinionated set of subclasses of zipkin_span and helper functions. The quoted code below has been edited for brevity.

Now the problem is the horrible hacks I had to go through to get the data above to look how I want. Here's what we actually had to do in the monkey-patched httplib.HTTPConnection.endheaders(), httplib.HTTPResponse.begin() and httplib.HTTPResponse.close() functions.

endheaders()

_span_contexts_by_fd = {}
def _patched_endheaders(self):
    if api.is_tracing():
        # Since we know this span is going out to a remote server (that's going
        # to share our span_id), we don't need to nor want to associate any new
        # spans on _this_ server as children of the span we're creating here.
        # So we override the default pushing of the created span onto the
        # tracer's stack with "do_context_push=False".
        # NOTE: this also means our patched begin & close CANNOT use the normal
        # api.get_tracer().get_span_ctx() trick because the created span isn't
        # on the tracer's stack.  Hence the _span_contexts_by_fd storage.
        span_ctx = api.ezipkin_client_span(
            api.default_service_name(), span_name=self._method,
            binary_annotations={'http.uri': self.path},
            do_context_push=False,  # <==== XXX HACK
        )
        span_ctx.start()
...
        span_ctx.add_remote_endpoint(host=self.host, port=self.port,
                                     service_name=remote_service_name)
        # XXX HACK! vvvvvvvvvvvvvvvvvvv
        tmp_stack = api.Stack()
        tmp_stack.push(span_ctx.zipkin_attrs)
        b3_headers = api.create_http_headers_for_this_span(
            context_stack=tmp_stack)
        tmp_stack.pop()
        # XXX HACK! ^^^^^^^^^^^^^^^^^^^
        for h, v in b3_headers.items():
            self.putheader(h, v)

    __org_endheaders__(self)

    if api.is_tracing():
        # XXX HACK to allow the concurrent, multiple HTTPResponse objects
        # to find the right associated span_ctx instance; the kernel will maintain
        # the uniqueness of file descriptors within this process.
        span_ctx._fd_key = self.sock.fileno()
        _span_contexts_by_fd[span_ctx._fd_key] = span_ctx

When it's time for the monkey-patched HTTPRespone.begin() and close() methods to operate on their correct span_ctx instance, they retrieve it like this:

# begin() may need to call .close() for HEAD requests, so self.fp.fileno()
# is only usable there.  So the span_ctx is retrieved and saved in self:
self._zipkin_span = _span_contexts_by_fd[self.fp.fileno()]
...
# close() idempotently retrieves span_ctx from self and closes it
    span_ctx = getattr(self, '_zipkin_span', None)
    if api.is_tracing() and span_ctx:
        span_ctx.stop()
        del _span_contexts_by_fd[span_ctx._fd_key]
        del self._zipkin_span

So there's two problems we had to overcome (the solution to the first created a third problem).

First, if you fire of concurrent spans like this:

span_ctx1 = zipkin_span(...)
span_ctx2 = zipkin_span(...)

then the 2nd span will have a ParentID of span_ctx1 instead of the parent of span_ctx1. In other words, span_ctx1 and span_ctx2 should be sibling leaves in the span tree and have the same ParentID. In the first screenshot, above, you can see all the SWIFT-OBJECT-SERVER spans sharing the ParentID of the root span.

So we "solved" that by allowing

span_ctx1 = ezipkin_span(..., do_context_push=False)
span_ctx2 = ezipkin_span(..., do_context_push=False)

which keeps each created span context object's zipkin_attrs from getting pushed onto the tracer's self._context_stack. That's how span_ctx2 ends up as a child of the same span that span_ctx1 is a child of. However, this creates a problem of how to get span_ctx1 and span_ctx2 later. There can be arbitrarily many of these concurrent spans and the order of their creation and later accessing is different, so something like a stack is inadequate to store them. We "solved" that problem by storing the created span context objects in a process-global dictionary keyed by socket file descriptor.

The second problem is that the create_http_headers_for_this_span() API finds the zipkin_attrs on the tracer's stack, and because we suppress the pushing of those attrs onto the stack, they're not there. So we had to cook up a fake Stack, push the right attrs onto it, and hand that stack into create_http_headers_for_this_span(). Gross.

tmp_stack = api.Stack()
tmp_stack.push(span_ctx.zipkin_attrs)
b3_headers = api.create_http_headers_for_this_span(
    context_stack=tmp_stack)
tmp_stack.pop()

Maybe this implies PR #142 isn't done and a create_http_headers_for_this_span() function should just be a method on zipkin_span.

Finally... help? Is this really the right thing to be doing? We're getting the results we want, as far as I can tell, but the mechanisms to get there do feel pretty gross. Am I missing something?

dbishop commented 4 years ago

Re: the do_context_push=False hack I did, e.g.:

        span_ctx = api.ezipkin_client_span(
            api.default_service_name(), span_name=self._method,
            binary_annotations={'http.uri': self.path},
            do_context_push=False,  # <==== XXX HACK
        )

I think PR #145 lays the groundwork for us being able to create an eventlet-aware version of code that works right in our use-case without that nasty do_context_push=False hack.

drolando commented 4 years ago

At Yelp we don't have this problem because we don't generate client HTTP spans from the application code. All requests have to go through either haproxy or envoy and we generate the client span from there. That's why we don't usually have this issue.

In the places where we have encountered it, we usually simply save in the future the start timestamp and then generate the span afterwards. When you create a span you can pass in a custom timestamp and duration, so that allows you to generate it when you resolve the future.

dbishop commented 4 years ago

Ah, and that explains why the API allows passing in a custom timestamp and duration; I'd been wondering what use-case that supported.

Yelp / py_zipkin

Best way to handle concurrent sibling client spans? #144