Support plugin for Vert.x

BFergerson commented 5 years ago

Please answer these questions before submitting your issue.

Why do you submit this issue?
[x] Question or discussion
[ ] Bug
[ ] Requirement
[ ] Feature or performance improvement

Question

What do you want to know?

I am creating a plugin for supporting Vert.x powered applications using SkyWalking. Everything appears to work when I spread out the events so they each happen one after the other but once I start letting the messages flow naturally SkyWalking won't finish a single span.

For example:

I call GET on URL /products/test (this creates an ExitSpan)
I receive GET on URL /products/test (this creates an EntrySpan)
I respond to GET on URL /products/test (this closes the EntrySpan)
I receive response for GET on URL /products/test (this closes the ExitSpan)

The above works correctly and the SkyWalking UI appears to agree. It's when the flow of events changes that SkyWalking will not finish a trace.

For example:

I call GET on URL /products/test (this creates an ExitSpan)
I call GET on URL /products/test (this creates an ExitSpan, child of previous)
I receive GET on URL /products/test (this creates an EntrySpan, child of previous)
I receive GET on URL /products/test (this creates an EntrySpan, child of previous)
I respond to GET on URL /products/test (this closes the EntrySpan, child of previous)
I respond to GET on URL /products/test (this closes the EntrySpan, child of previous)
I receive response for GET on URL /products/test (this closes the ExitSpan, child of previous)
I receive response for GET on URL /products/test (this closes the ExitSpan)

In the above example, SkyWalking would eventually finish a trace but it would be the single trace instead of the expected two traces. The reason I say that SkyWalking never finishes a trace is that I have the above occurring every second and since SkyWalking keeps adding spans as the child of the previous it keeps a non-empty span stack and continues without finishing any traces.

From my best understanding of what's happening, the TracingContext class requires either of two scenarios to create a new trace:

TracingContext.activeSpanStack is empty
The current thread hasn't started a trace

If that's correct then I don't see how I can get this to work with Vert.x. In this case, Vert.x is using the same thread to run all of the events on. Both of the GET requests, both of the GET receives, and both of the GET responses happen on the same thread and I can't simply use the InstanceMethodsAroundInterceptor.afterMethod method to close spans because of the asynchronous nature of Vert.x.

If my understanding is correct what would you guys recommend to tackle this issue?

Which version of SkyWalking, OS and JRE?

SkyWalking 6.0.0 GA, Ubuntu 18.04, Java 8

wu-sheng commented 5 years ago

If my understanding is correct what would you guys recommend to tackle this issue?

I think most of your understanding is correct, the thread model of vert.x is a typical model in RPC framework. Also, SkyWalking did request the TracingContext working in that way, most of tracing system asked for this too.

The key is, you created the span in a shared thread(that single thread, I am guessing this is nio/vertx event loop thread). SkyWalking is focusing on business logic, the span could be created at the entrance of business codes, such as we did for SpringMVC. We create span at Controller edge, then, no matter which web container is used inside, the span will work as expected.

Like in the codes

public class MyServer extends AbstractVerticle {

    @Override
    public void start() {

        Router router = Router.router(vertx);

        router.route().handler(this::queryHandler);
        vertx.createHttpServer().requestHandler(router::accept)
                .listen(8080);
        System.out.println("MyServer startup");
    }

    private void queryHandler(RoutingContext routingContext) {
        routingContext.response()
                .putHeader("content_type", "application/json")
                .end("hello world");
    }

Span should be created when #requestHandler called, there should not be a shared thread at this point, because it can't in Java thread model, the requestHandler will hold the thread until it is done or start a child thread(that is context capture/continue designed for)

BFergerson commented 5 years ago

I had a light bulb moment when you said the key was I was creating spans on a shared thread. I'm pretty sure you're absolutely right. Unfortunately, I think that's the only way to use Vert.x. Everything that runs in the verticle is run on the event loop and in my situation, there is only a single event loop. That's what I'm seeing locally as I debug the creation of the spans. The thread is always vert.x-eventloop-thread-0.

I should have included some Vert.x code to begin with to open this discussion. I've included some below and described the steps the code takes and the steps I take to trace it.

public class Sender extends AbstractVerticle {

    public static void main(String[] args) {
        Vertx vertx = Vertx.vertx();
        vertx.deployVerticle(new Sender());
    }

    @Override
    public void start() {
        Router router = Router.router(vertx);
        router.route().handler(BodyHandler.create());
        router.get("/products/:productID").handler(this::handleGetProduct);
        vertx.createHttpServer().requestHandler(router).listen(8080);

        //vertx.setPeriodic(1000, v -> callGetProduct());
        callGetProduct();
    }

    private void callGetProduct() {
        HttpClient httpClient = vertx.createHttpClient();
        httpClient.getNow(8080, "localhost", "/products/test",
                new Handler<HttpClientResponse>() {

                    @Override
                    public void handle(HttpClientResponse httpClientResponse) {
                        System.out.println("Got response");
                    }
                });
    }

    private void handleGetProduct(RoutingContext routingContext) {
        HttpServerResponse response = routingContext.response();
        vertx.setTimer(4000, it -> {
            response.setStatusCode(200).end("Done");
        });
    }
}

I've tried to keep it as simple as possible and I think I'm creating the right spans at the right places.

I have an HttpClient which calls a GET request on /products/test. This opens a ExitSpan.
This request is received by handleGetProduct. This opens a EntrySpan.
The response is delayed by 4 seconds and then responds with "Done". This closes the EntrySpan.
The response is received by handle. This closes the ExitSpan.

The locations I'm doing the opening and closing of spans seems correct to me as well.

The HttpClient sends the GET request by calling HttpClientRequestImpl.end(). I take the ContextCarrier and inject it into the header and open the ExitSpan. I also assign the ContextSnapshot to the request (via EnhancedInstance) for closing later.
The GET request is receieved and sent to the target method via HttpServerRequestImpl.doEnd(). I remove the headers to rebuild the ContextCarrier and open a EntrySpan. I also assign the ContextSnapshot to the response (via EnhancedInstance) for closing later.
4 seconds pass
I pull the ContextSnapshot from the response (via EnhancedInstance) and close the EntrySpan once HttpServerResponseImpl.end() is called as this signifies the response has been written.
I pull the ContextSnapshot from the request (via EnhancedInstance) and close the ExitSpan once HttpClientRequestImpl.doHandleResponse() is called as this signifies a request has been replied to.

Again, this all works when I just go through this flow once. The single call in the code to callGetProduct(); fulfills this scenario. It's once //vertx.setPeriodic(1000, v -> callGetProduct()); is uncommented that it starts acting up. As you can see, setPeriodic makes the call to callGetProduct every second. This means that I won't have the previous trace closed by the time the next one starts.

Given that Vert.x is running everything on a shared thread, and the goal is to not create spans on a shared thread, what's my move here?

wu-sheng commented 5 years ago

The HttpClient sends the GET request by calling HttpClientRequestImpl.end(). I take the ContextCarrier and inject it into the header and open the ExitSpan. I also assign the ContextSnapshot to the request (via EnhancedInstance) for closing later.

You should finish the span, and hold the ContextSnapshot. When callbacke happens, create a new local span(should be new context, because other closed their exit span too), and continue the ContextSnapshot.

I pull the ContextSnapshot from the response (via EnhancedInstance) and close the EntrySpan once HttpServerResponseImpl.end() is called as this signifies the response has been written.

From you description, I am a little confused about how you use ContextSnapshot and ContextCarrier. How does I pull the ContextSnapshot from the response happen? The ContextCarrier is designed for across processes(RPC, MQ), ContextSnapshot is used across thread in a single process only. Is this just typo or do I miss somthing?

I pull the ContextSnapshot from the request (via EnhancedInstance) and close the ExitSpan once HttpClientRequestImpl.doHandleResponse() is called as this signifies a request has been replied to.

In pure async scenario, like vertx, response is just a new local span, entry span should be closed as soon as an async jos assignment happens. Don't wait until this reponse called. Because if the thread model you provided, the thread could do anything because another thread to process the logic and do the response.

It's once //vertx.setPeriodic(1000, v -> callGetProduct()); is uncommented that it starts acting up. As you can see, setPeriodic makes the call to callGetProduct every second. This means that I won't have the previous trace closed by the time the next one starts.

I think my prev explanation about when is the right time to close span, should make this easier. Create snapshot and finish the ongoing span when do setPeriodic, then create a local span when v -> callGetProduct happens.

I think my most answers focus on one thing, close/finish span before async job happens. :) Hope this would help you.

BFergerson commented 5 years ago

@wu-sheng, your advice appears to have gotten me closer but I think there is something I'm fundamentally not understanding. I'm able to produce the following trace stack:

{
  "traceSpans" : [ {
    "traceId" : "7.39.15523638435290001",
    "segmentId" : "7.39.15523638435290000",
    "spanId" : 0,
    "parentSpanId" : -1,
    "refs" : [ ],
    "serviceCode" : "8fd2ffff-8d5b-46ed-a681-951fb18ed1d1",
    "startTime" : 1552363843533,
    "endTime" : 1552363843595,
    "endpointName" : "/products/test",
    "type" : "Exit",
    "peer" : "/products/test",
    "component" : "Vertx",
    "layer" : "RPCFramework",
    "tags" : { },
    "logs" : [ ],
    "error" : false
  }, {
    "traceId" : "7.39.15523638435290001",
    "segmentId" : "7.39.15523638478920006",
    "spanId" : 0,
    "parentSpanId" : -1,
    "refs" : [ {
      "traceId" : "7.39.15523638435290001",
      "parentSegmentId" : "7.39.15523638435290000",
      "parentSpanId" : 0,
      "type" : "CROSS_THREAD"
    } ],
    "serviceCode" : "8fd2ffff-8d5b-46ed-a681-951fb18ed1d1",
    "startTime" : 1552363847892,
    "endTime" : 1552363847892,
    "endpointName" : "HttpClientRequestImplDoHandleResponseInterceptor-#/products/test",
    "type" : "Local",
    "peer" : "",
    "component" : "",
    "layer" : "Unknown",
    "tags" : { },
    "logs" : [ ],
    "error" : false
  }, {
    "traceId" : "7.39.15523638435290001",
    "segmentId" : "7.39.15523638438680002",
    "spanId" : 0,
    "parentSpanId" : -1,
    "refs" : [ {
      "traceId" : "7.39.15523638435290001",
      "parentSegmentId" : "7.39.15523638435290000",
      "parentSpanId" : 0,
      "type" : "CROSS_PROCESS"
    } ],
    "serviceCode" : "8fd2ffff-8d5b-46ed-a681-951fb18ed1d1",
    "startTime" : 1552363843869,
    "endTime" : 1552363843870,
    "endpointName" : "/products/test",
    "type" : "Entry",
    "peer" : "",
    "component" : "Vertx",
    "layer" : "RPCFramework",
    "tags" : { },
    "logs" : [ ],
    "error" : false
  }, {
    "traceId" : "7.39.15523638435290001",
    "segmentId" : "7.39.15523638478710004",
    "spanId" : 0,
    "parentSpanId" : -1,
    "refs" : [ {
      "traceId" : "7.39.15523638435290001",
      "parentSegmentId" : "7.39.15523638438680002",
      "parentSpanId" : 0,
      "type" : "CROSS_THREAD"
    } ],
    "serviceCode" : "8fd2ffff-8d5b-46ed-a681-951fb18ed1d1",
    "startTime" : 1552363847871,
    "endTime" : 1552363847882,
    "endpointName" : "HttpServerResponseImplEndInterceptor-#/products/test",
    "type" : "Local",
    "peer" : "",
    "component" : "Vertx",
    "layer" : "Unknown",
    "tags" : { },
    "logs" : [ ],
    "error" : false
  } ],
  "total" : 4
}

This produces the diagram: Screenshot from 2019-03-11 22-12-00

Which is sort of odd when it considers the same trace as: Screenshot from 2019-03-11 22-19-55

It seems to somewhat understand that it took 4 seconds to complete. Essentially I followed your advice and made the process as such:

When the GET request is called the ExitSpan is opened on beforeMethod and the ContextCarrier is added to the header. The ContextSnapshot is saved to the SkyWalkingDynamicField for the request. The ExitSpan is closed on afterMethod.
The GET call is received and a EntrySpan is opened with the ContextCarrier extracted from the request's headers during beforeMethod. The ContextSnapshot is set on the SkyWalkingDynamicField of the response object. The EntrySpan is closed on the afterMethod.
4 seconds pass
HttpServerResponseImplEndInterceptor.beforeMethod is triggered and creates a LocalSpan and calls ContextManager.continued with the ContextSnapshot saved to the response object. The LocalSpan is closed on the afterMethod.
HttpClientRequestImplDoHandleResponseInterceptor.beforeMethod is triggered and creates a LocalSpan and calls ContextManager.continued with the ContextSnapshot saved to the request object. The LocalSpan is closed on the afterMethod.

I don't exactly have a question here so I hope you see what I'm doing wrong. The code I'm testing with is located here: https://github.com/apache/incubator-skywalking/compare/master...BFergerson:master

wu-sheng commented 5 years ago

Essentially I followed your advice and made the process as such:

The processes you mentioned make sense to me, it should be processed in this way when you are facing async scenarios. I assume right now, the tracing and metric should be working well even in concurrency cases, right?

wu-sheng commented 5 years ago

Just one suggestion, the local span you created, the operation name should be related to target framework behaviours, like a description, rather than which interceptor of SkyWalking plugin is doing this. Because in most cases, interceptor is just being known by plugin developer.

Of course, maybe this is just because, you are testing. :) If so, ignore me.

BFergerson commented 5 years ago

@wu-sheng, yeah those operation names are for testing right now. I've deleted code to try to keep what I'm showing you as straight forward as possible. I'll make sure to address that once everything is working.

And yes, your suggestions do fix the problem I was originally having with the traces becoming children of each other and never sending any to SkyWalking, but now I'm dealing with this: Screenshot from 2019-03-13 07-57-30

As you can see, it doesn't correctly understand that /products/test took 4 seconds to respond. It understands it when you open the trace up but on the endpoint dashboard it doesn't. What causes this discrepancy? Do the trace spans I'm producing in the JSON above seem correct?

wu-sheng commented 5 years ago

As you can see, it doesn't correctly understand that /products/test took 4 seconds to respond. It understands it when you open the trace up but on the endpoint dashboard it doesn't. What causes this discrepancy? Do the trace spans I'm producing in the JSON above seem correct?

Actually, this is what I expect. Because the segment duration is shorter, the endpoint duration is based on that.

Also, if you want to show the segment perfectly, we need to build a new TracingContext implementation, with clear explicit declare to use it. In my previous experiences, that style of APIs is not very friendly to plugin developer, so I haven't implemented it. If you want to take part in more, I could start a draft and let us work on that.

BFergerson commented 5 years ago

@wu-sheng, I would definitely like to implement support for Vert.x so if it's necessary to create a new TracingContext implementation to do that then yes, please start a draft.

Because the segment duration is shorter, the endpoint duration is based on that.

Before you do, would you mind breaking down a bit why a new implementation is necessary? Specifically the logic quoted above. I understand a span to be a single "unit of work". From the SkyWalking documentation a segment is "a concept in SkyWalking, it should include all span for per request in a single OS process, usually single thread based on language". Based on these definitions, wouldn't everything involved with finishing the request for /products/test be included in the segment? Meaning why not add logic which indicates that a segment is always equal to or longer than the duration of the spans which it consists of?

wu-sheng commented 5 years ago

Sorry for don't make this clear to me.

a segment is "a concept in SkyWalking, it should include all span for per request in a single OS process, usually single thread based on language.

The Key in Java for plugin developer convenience, segment always works in a single thread at the same time. And this fits most of scenarios in Java codes. But in vertx(react style), things changed. Your case is very typical.

Thread accept the request, then

Create span, meaning, create a segment.
Try to assign the work to another thread
Right now, the thread still holds the segment/tracingcontext in thread local.

That is why if we don't finish that, the new incoming request uses the existing trace context, which triggers your issue at the beginning.

But we finish the span, then because tracing context is a stack-based structure, right? The tracing context finished, and segment finished too. So at this moment, the thread local is available for new request. That is why you traces look like working well in concurrency requests.

The side-effect of doing this, is segment finished when your work assignment is finished, not works are done because that is in another thread, right?

SkyWalking backend analysis the response time based on segment and entry span, that is why you can't see 4s(in your case) here, but you can see 4s in trace visualization. The trace visualization shows 4s because UI did that.

SkyWalking analysis 100% trust the entry span and segment, that is why we are high performance and don't need distributed cache in any case. So when request entry span says ~0s duration, the endpoint metric show in the same way.

I am proposal the complex tracing context implementation is

TracingContext and Segment could go across threads in Java
Span finished time could be reset in another thread, because of (1), you could get that, and context would not finish before you say so.

But in this case, API of context is more complex, and lock mechanism must be involved.

BFergerson commented 5 years ago

@wu-sheng, I think I'm almost on the same page with you. You're saying the JSON trace stack I included above is correct in all aspects except for the endTime on the ExitSpan and EntrySpan? And it's because of the fact that the EntrySpan has an incorrect endTime that the response time shows up as it does in the UI? And finally, this can't be done with the existing TracingContext because it will finish the current Segment and send that to the collector before the callback happens (so no possibility for modifying the endTime).

TracingContext and Segment could go across threads in Java

If my understanding is correct then how does this proposal work? The TracingContext would have already finished the Segment even if I persist them till the time of the callback? Would there be additional logic in this TracingContext implementation which prevented it from finishing a Segment of just one EntrySpan?

Thanks for all the help BTW. I'm learning a lot.

wu-sheng commented 5 years ago

Yes. In my mind, for new tracing context mechanism(only optional), if you do capture, it will hold the segment untill continue happens. Only finish the whole segment and tracing context, if and only if all related threads are finished and no capture without continued.

Does this make sense?

BFergerson commented 5 years ago

@wu-sheng, I think I have a firm grasp of the problem now lol. Instead of modifying the endTime why not just wait to close the span? I understand that waiting to close the span won't work with TracingContext but that's because it deals with one segment at a time. A new TracingContext implementation could work with multiple segments at a time requiring a key to choose which segment the spans belong to. I think with an implementation like this I could delay closing the EntrySpan until the response is sent (how I was doing it before). That also lets the process for all the TracingContext implementations remain essentially the same and this multiple segments/asynchronous one doesn't have extra look back logic.

Edit: Also, I could see how the same could be implemented with multiple TracingContext instead of an implementation with multiple segments. I'm thinking something like AsyncContextManager.

wu-sheng commented 5 years ago

In theory yes, just delay to close the span, in practice way, it is hard. Because the span should still in stack-based structure, as a plugin developer, you can't tell whether there is one/more parent span(s) need to close by other plugins, right? So you must follow the stack order, to close the span. That mean, entry span must be finished first and reset finish time later.

Edit: Also, I could see how the same could be implemented with multiple TracingContext instead of an implementation with multiple segments. I'm thinking something like AsyncContextManager.

Are you saying to use AsyncContextManager replace ContextManager? If so, I think that will not work. There are plenty of plugins work out there using ContextManager as core API, you change that, means propagation context in the thread with other plugins will fail. I think AbstractTracerContext implementation makes more sense.

BFergerson commented 5 years ago

@wu-sheng, okay the ContextManager implementation might be too far. How is this not an issue for other async plugins you have though (eg: Apache httpcomponent AsyncClient 4.x)? Also, what kind of draft would you be starting?

Edit: You're probably going to say you don't start/close spans in the shared thread :P

wu-sheng commented 5 years ago

Edit: You're probably going to say you don't start/close spans in the shared thread :P

Yes and there is a little different for them. Because they just do async call once for a time, and for their developer, they think, it just blocks short time, then the short duration makes sense. So they put callback in another thread and another segment/tracingcontext just like you did.

The ContextManager implementation is not hard, just it must count on continue must be executed after you call capture. Otherwise, the context would never finish and span in first context will be totally lost.

Does this make sense to you?

wu-sheng commented 5 years ago

AcrossThreadContextTracing should be the right term for this.

wu-sheng commented 5 years ago

If it works, we could make asynchttp, grpc-streaming work in this way too. And in .net too.

cc @liuhaoyang @ascrutae @hanahmily I think you may be interested in this term and design too.

BFergerson commented 5 years ago

@wu-sheng, I guess I'm just not entirely sure where you see the difficulty in holding the EntrySpan open until the response. Even in a completely asynchronous environment, the response will be the last event to occur. The EntrySpan is opened, all the various callbacks involved are handled through LocalSpan objects which add to the appropriate segment (stack-based all the same), and then when the response is finally ready to be written the EntrySpan is retrieved, closed, and the segment is flushed to the collector.

Why would I need to know what the parent span is? Each span is dealt with individually. If I open a span I have to make sure to close it. If that happens then by the time the response is ready the active span will be the correct span to close.

Where is the difficulty that you're seeing that I'm not?

wu-sheng commented 5 years ago

Why would I need to know what the parent span is? Each span is dealt with individually. If I open a span I have to make sure to close it. If that happens then by the time the response is ready the active span will be the correct span to close.

First of all, there isn't about you need to know the parent span. It is more about when tracing context could finish automatically and in the right time. In SkyWalking design and its API, you could notice that we don't have an API called TracingContext#finished from any plugin, just inside TracingContext triggered automatically, WHY?, because no plugin knows he is the last/first span.

In codes level, read TracingContext#activeSpanStack and EntrySpan(also its parent StackBasedTracingSpan), you will find out the span finish operation need to do in the exact same order as open it. Also TracingContext is in the ThreadLocal, which ones, one thread could only have one TracingContext.

Based on this background, how do you don't finish the entry span but switch the TracingContext to another request?

So, things like EntrySpan is retrieved only could happen when tracing context doesn't finish even all spans finished(removed from the active stack). That is key I mean for new TracingContext implementation, hold the context and span, make you have the chance to change/adjust the span.

Does this make more sense to you now?

This is a very complex scenario. Argue and the trade-off between development convenient and API capabilities. Most tracing systems ignore this to do based on Dapper paper says, so backend will face heavy payload. But they, like Zipkin and Jaeger, are not APM solution, so this doesn't matter for them. In another way, this design is common in APM vendor core, which they keep in the shadow :)

wu-sheng commented 5 years ago

Another way to do a thing is by providing a new Entry/Exit Span feature to support EntrySpan is retrieved. Such as, Span#finishWithFurtherNotitify, which remove it from the active span, but still hold the tracing context. Only when Span#finishcalled(in another thread), tracing context finished and sent.

BFergerson commented 5 years ago

@wu-sheng, I think I understand. I assumed that the Vert.x EntrySpan would be the root one. I wasn't thinking bigger picture. I agree about modifying the endTime after the fact.

wu-sheng commented 5 years ago

I will try to start a draft of my last proposal, provide span a feature for further explicit finish notification, which could be done in another thread. If I find other conflicts, I will inform you too.

wu-sheng commented 5 years ago

PR submit. A draft version

BFergerson commented 5 years ago

@wu-sheng, using the new asynchronous functionality I'm able to get the following results: Screenshot from 2019-03-15 18-15-13

This appears correct to me but I'm able to achieve these results without creating local spans and without using ContextSnapshot. I simply call prepareForAsync when the GET request is sent and received and asyncFinish when the response is sent and received. This produces the results above with the single exit span and single entry span.

Is that how you meant for it to be used? Previously you said that the callback was supposed to be a new local span. Is that still the case when using the asynchronous functionality? If so, what is the order of events I should be taking? First open the local span, then close the entry span via asyncFinish, then close the local span?

wu-sheng commented 5 years ago

Is that still the case when using the asynchronous functionality? If so, what is the order of events I should be taking? First open the local span, then close the entry span via asyncFinish then close the local span?

From the trace, you should be using APIs in the right way. The other steps are about, you should create new context(including local span) in another thread(callback) in case there is other further plugin working.

Such as in your codes

    private void callGetProduct() {
        HttpClient httpClient = vertx.createHttpClient();
        httpClient.getNow(8080, "localhost", "/products/test",
                new Handler<HttpClientResponse>() {

                    @Override
                    public void handle(HttpClientResponse httpClientResponse) {
                        /////////////////////////////
                        // Add more real codes here,
                        // including call another RPC or access Database
                        /////////////////////////////
                    }

You would like the trace could continue in #handle method, right? If you don't create new context in handle thread, the trace will break, because async span doesn't include the capability to propagate context. Async span is just for making sure duration/tag/log/etc. fields right/accurate.

wu-sheng commented 5 years ago

I am glad we are almost done:)

BFergerson commented 5 years ago

@wu-sheng, everything makes sense when you say it :P. I've added the local span back. I think the last question I have before I repeat this pattern for other Vert.x classes is:

does the order of ContextManager.continued and asyncFinish make any difference?

I've tried both and they both work and seem to produce the same traces.

The way I have it currently coded for the callback method is:

@Override
public void beforeMethod(EnhancedInstance objInst, Method method, Object[] allArguments, Class<?>[] argumentsTypes,
                         MethodInterceptResult result) throws Throwable {
    VertxContext context = (VertxContext) objInst.getSkyWalkingDynamicField();
    ContextManager.createLocalSpan(context.getContextSnapshot().getParentOperationName());
    ContextManager.continued(context.getContextSnapshot());
    context.getSpan().asyncFinish();
}

@Override
public Object afterMethod(EnhancedInstance objInst, Method method, Object[] allArguments, Class<?>[] argumentsTypes,
                          Object ret) throws Throwable {
    ContextManager.stopSpan();
    return ret;
}

The above code is the interceptor for the method:

@Override
public void handle(HttpClientResponse httpClientResponse) {
    /////////////////////////////
    // Add more real codes here,
    // including call another RPC or access Database
    /////////////////////////////
}

wu-sheng commented 5 years ago

does the order of ContextManager.continued and asyncFinish make any difference?

Nothing related.

wu-sheng commented 5 years ago

PR merged. When you are ready, you could submit the vert.x PR :)

vietj commented 5 years ago

there is work for tracing done in Vert.x for 4.0 , it is described in this spec: https://github.com/vert-x3/wiki/wiki/RFC:-context-tracing

also there are two implementations with OpenTracing and Zipkin here that can be used as examples https://github.com/vietj/vertx-tracing

wu-sheng commented 5 years ago

This is an auto instrumentation plugin for skywalking native.

BFergerson commented 5 years ago

@vietj, the new tracing features in Vert.x 4.0 look easy and simple to use. As far as I understand though, there isn't a simple way to do it in Vert.x 3.x and that's the version we're adding native SkyWalking tracing support to.

SkyWalking already supports OpenTracing so it sounds like Vert.x 4.0 will be automatically supported without a native plugin, so that's awesome.

wu-sheng commented 5 years ago

@BFergerson I wouldn't expect OpenTracing works very well. Sadly to say so. But we could take advances of vertx core to make auto instrument better and easier.

BFergerson commented 5 years ago

@wu-sheng, could you explain that please? OpenTracing support is mentioned on this project several times. It's my understanding that if you support OpenTracing then you support anything that uses OpenTracing compatible tracers?

vietj also mentions that they will have Zipkin support for Vert.x 4.0. Even with OpenTracing and Zipkin native support for Vert.x we would still need an auto-instrumentation plugin?

wu-sheng commented 5 years ago

ould you explain that please? OpenTracing support is mentioned on this project several times. It's my understanding that if you support OpenTracing then you support anything that uses OpenTracing compatible tracers?

That is the typical issue of OpenTracing. OpenTracing is just a set of API, it is no guarantee. And for SkyWalking, it is just a bridge API to agent core, which means, sync call most likely works, but requires user manually active SkyWalking tracer, based on OT API, but in the real world, most users would not because we have the agent, people want really auto, right?

For vertx, as an async lib, I really doubt OT's compatible issue, especially when it works with agent core.

vietj also mentions that they will have Zipkin support for Vert.x 4.0. Even with OpenTracing and Zipkin native support for Vert.x we would still need an auto-instrumentation plugin?

If we just want Zipkin trace, and work with other zipkin plugin, then you don't need auto-instrument agent. If you want SkyWalking format and work with other our auto-instrument plugins, then YES, we still need the plugin.

Zipkin trace and format supported in SkyWalking OAP, but only when all traces collected by Zipkin lib. We can't support hybrid.

Basically, the auto-instrument magic means, in almost all cases, plugins are required. In SkyWalking most known users, they only use OT-SkyWalking API to set some tag and local span.

wu-sheng commented 5 years ago

I mostly work around auto-instrument, especially in Java. @adriancole knows much more about the OpenTracing issues in manual instrumentation library compatible. It is far from your expected right now.

wu-sheng commented 5 years ago

Two more things.

OpenTracing Java 0.32 breaks 0.31 compatible again, and our community has no one wants to do upgrade right now. Sadly.
OpenTracing and OpenCensus are going to be merged, so API will most likely be broken again.

You see, there are a lot of things happening, and I and most SkyWalking users/contributors want 100% stable and reliable, we recommend our native format and plugins always. We are tracing and monitoring a lot of systems, we can't break them.

codefromthecrypt commented 5 years ago

@wu-sheng you can try to ignore the opentracing thing and look at the other one in vert.x 4. I think @vietj did bake in some OT semantics.. some are easy to fix. however OT and brave apis are similar anyway. it might be more difficult to take something not largely a clone of brave and integrate it https://github.com/vietj/vertx-tracing/pull/1/files#r274201967

BFergerson commented 5 years ago

We can close this now with https://github.com/apache/incubator-skywalking/pull/2386 being accepted. This gives Skywalking basic support for Vert.x. As I continue to monitor my Vert.x applications with Skywalking I'll expand on this support. Thanks for all the help @wu-sheng. Looking forward to the release of 6.1 ;).

wu-sheng commented 5 years ago

@BFergerson Thanks. Welcome to contribute other plugins. The 6.1 release should happen next month.

apache / skywalking

Support plugin for Vert.x #2341

Question

Which version of SkyWalking, OS and JRE?