Closed BFergerson closed 5 years ago
If my understanding is correct what would you guys recommend to tackle this issue?
I think most of your understanding is correct, the thread model of vert.x is a typical model in RPC framework. Also, SkyWalking did request the TracingContext working in that way, most of tracing system asked for this too.
The key is, you created the span in a shared thread(that single thread, I am guessing this is nio/vertx event loop thread). SkyWalking is focusing on business logic, the span could be created at the entrance of business codes, such as we did for SpringMVC. We create span at Controller edge, then, no matter which web container is used inside, the span will work as expected.
Like in the codes
public class MyServer extends AbstractVerticle {
@Override
public void start() {
Router router = Router.router(vertx);
router.route().handler(this::queryHandler);
vertx.createHttpServer().requestHandler(router::accept)
.listen(8080);
System.out.println("MyServer startup");
}
private void queryHandler(RoutingContext routingContext) {
routingContext.response()
.putHeader("content_type", "application/json")
.end("hello world");
}
Span should be created when #requestHandler
called, there should not be a shared thread at this point, because it can't in Java thread model, the requestHandler
will hold the thread until it is done or start a child thread(that is context capture/continue designed for)
I had a light bulb moment when you said the key was I was creating spans on a shared thread. I'm pretty sure you're absolutely right. Unfortunately, I think that's the only way to use Vert.x. Everything that runs in the verticle is run on the event loop and in my situation, there is only a single event loop. That's what I'm seeing locally as I debug the creation of the spans. The thread is always vert.x-eventloop-thread-0
.
I should have included some Vert.x code to begin with to open this discussion. I've included some below and described the steps the code takes and the steps I take to trace it.
public class Sender extends AbstractVerticle {
public static void main(String[] args) {
Vertx vertx = Vertx.vertx();
vertx.deployVerticle(new Sender());
}
@Override
public void start() {
Router router = Router.router(vertx);
router.route().handler(BodyHandler.create());
router.get("/products/:productID").handler(this::handleGetProduct);
vertx.createHttpServer().requestHandler(router).listen(8080);
//vertx.setPeriodic(1000, v -> callGetProduct());
callGetProduct();
}
private void callGetProduct() {
HttpClient httpClient = vertx.createHttpClient();
httpClient.getNow(8080, "localhost", "/products/test",
new Handler<HttpClientResponse>() {
@Override
public void handle(HttpClientResponse httpClientResponse) {
System.out.println("Got response");
}
});
}
private void handleGetProduct(RoutingContext routingContext) {
HttpServerResponse response = routingContext.response();
vertx.setTimer(4000, it -> {
response.setStatusCode(200).end("Done");
});
}
}
I've tried to keep it as simple as possible and I think I'm creating the right spans at the right places.
HttpClient
which calls a GET
request on /products/test
. This opens a ExitSpan
.handleGetProduct
. This opens a EntrySpan
.EntrySpan
.handle
. This closes the ExitSpan
.The locations I'm doing the opening and closing of spans seems correct to me as well.
HttpClient
sends the GET
request by calling HttpClientRequestImpl.end()
. I take the ContextCarrier
and inject it into the header and open the ExitSpan
. I also assign the ContextSnapshot
to the request (via EnhancedInstance
) for closing later.GET
request is receieved and sent to the target method via HttpServerRequestImpl.doEnd()
. I remove the headers to rebuild the ContextCarrier
and open a EntrySpan
. I also assign the ContextSnapshot
to the response (via EnhancedInstance
) for closing later.ContextSnapshot
from the response (via EnhancedInstance
) and close the EntrySpan
once HttpServerResponseImpl.end()
is called as this signifies the response has been written.ContextSnapshot
from the request (via EnhancedInstance
) and close the ExitSpan
once HttpClientRequestImpl.doHandleResponse()
is called as this signifies a request has been replied to.Again, this all works when I just go through this flow once. The single call in the code to callGetProduct();
fulfills this scenario. It's once //vertx.setPeriodic(1000, v -> callGetProduct());
is uncommented that it starts acting up. As you can see, setPeriodic
makes the call to callGetProduct
every second. This means that I won't have the previous trace closed by the time the next one starts.
Given that Vert.x is running everything on a shared thread, and the goal is to not create spans on a shared thread, what's my move here?
The HttpClient sends the GET request by calling HttpClientRequestImpl.end(). I take the ContextCarrier and inject it into the header and open the ExitSpan. I also assign the ContextSnapshot to the request (via EnhancedInstance) for closing later.
You should finish the span, and hold the ContextSnapshot. When callbacke happens, create a new local span(should be new context, because other closed their exit span too), and continue the ContextSnapshot.
I pull the ContextSnapshot from the response (via EnhancedInstance) and close the EntrySpan once HttpServerResponseImpl.end() is called as this signifies the response has been written.
From you description, I am a little confused about how you use ContextSnapshot
and ContextCarrier
. How does I pull the ContextSnapshot from the response
happen? The ContextCarrier is designed for across processes(RPC, MQ), ContextSnapshot is used across thread in a single process only. Is this just typo or do I miss somthing?
I pull the ContextSnapshot from the request (via EnhancedInstance) and close the ExitSpan once HttpClientRequestImpl.doHandleResponse() is called as this signifies a request has been replied to.
In pure async scenario, like vertx, response is just a new local span, entry span should be closed as soon as an async jos assignment happens. Don't wait until this reponse called. Because if the thread model you provided, the thread could do anything because another thread to process the logic and do the response.
It's once //vertx.setPeriodic(1000, v -> callGetProduct()); is uncommented that it starts acting up. As you can see, setPeriodic makes the call to callGetProduct every second. This means that I won't have the previous trace closed by the time the next one starts.
I think my prev explanation about when is the right time to close span, should make this easier. Create snapshot and finish the ongoing span when do setPeriodic
, then create a local span when v -> callGetProduct
happens.
I think my most answers focus on one thing, close/finish span
before async job happens. :) Hope this would help you.
@wu-sheng, your advice appears to have gotten me closer but I think there is something I'm fundamentally not understanding. I'm able to produce the following trace stack:
{
"traceSpans" : [ {
"traceId" : "7.39.15523638435290001",
"segmentId" : "7.39.15523638435290000",
"spanId" : 0,
"parentSpanId" : -1,
"refs" : [ ],
"serviceCode" : "8fd2ffff-8d5b-46ed-a681-951fb18ed1d1",
"startTime" : 1552363843533,
"endTime" : 1552363843595,
"endpointName" : "/products/test",
"type" : "Exit",
"peer" : "/products/test",
"component" : "Vertx",
"layer" : "RPCFramework",
"tags" : { },
"logs" : [ ],
"error" : false
}, {
"traceId" : "7.39.15523638435290001",
"segmentId" : "7.39.15523638478920006",
"spanId" : 0,
"parentSpanId" : -1,
"refs" : [ {
"traceId" : "7.39.15523638435290001",
"parentSegmentId" : "7.39.15523638435290000",
"parentSpanId" : 0,
"type" : "CROSS_THREAD"
} ],
"serviceCode" : "8fd2ffff-8d5b-46ed-a681-951fb18ed1d1",
"startTime" : 1552363847892,
"endTime" : 1552363847892,
"endpointName" : "HttpClientRequestImplDoHandleResponseInterceptor-#/products/test",
"type" : "Local",
"peer" : "",
"component" : "",
"layer" : "Unknown",
"tags" : { },
"logs" : [ ],
"error" : false
}, {
"traceId" : "7.39.15523638435290001",
"segmentId" : "7.39.15523638438680002",
"spanId" : 0,
"parentSpanId" : -1,
"refs" : [ {
"traceId" : "7.39.15523638435290001",
"parentSegmentId" : "7.39.15523638435290000",
"parentSpanId" : 0,
"type" : "CROSS_PROCESS"
} ],
"serviceCode" : "8fd2ffff-8d5b-46ed-a681-951fb18ed1d1",
"startTime" : 1552363843869,
"endTime" : 1552363843870,
"endpointName" : "/products/test",
"type" : "Entry",
"peer" : "",
"component" : "Vertx",
"layer" : "RPCFramework",
"tags" : { },
"logs" : [ ],
"error" : false
}, {
"traceId" : "7.39.15523638435290001",
"segmentId" : "7.39.15523638478710004",
"spanId" : 0,
"parentSpanId" : -1,
"refs" : [ {
"traceId" : "7.39.15523638435290001",
"parentSegmentId" : "7.39.15523638438680002",
"parentSpanId" : 0,
"type" : "CROSS_THREAD"
} ],
"serviceCode" : "8fd2ffff-8d5b-46ed-a681-951fb18ed1d1",
"startTime" : 1552363847871,
"endTime" : 1552363847882,
"endpointName" : "HttpServerResponseImplEndInterceptor-#/products/test",
"type" : "Local",
"peer" : "",
"component" : "Vertx",
"layer" : "Unknown",
"tags" : { },
"logs" : [ ],
"error" : false
} ],
"total" : 4
}
This produces the diagram:
Which is sort of odd when it considers the same trace as:
It seems to somewhat understand that it took 4 seconds to complete. Essentially I followed your advice and made the process as such:
GET
request is called the ExitSpan
is opened on beforeMethod
and the ContextCarrier
is added to the header. The ContextSnapshot
is saved to the SkyWalkingDynamicField
for the request. The ExitSpan
is closed on afterMethod
.GET
call is received and a EntrySpan
is opened with the ContextCarrier
extracted from the request's headers during beforeMethod
. The ContextSnapshot
is set on the SkyWalkingDynamicField
of the response object. The EntrySpan
is closed on the afterMethod
.HttpServerResponseImplEndInterceptor.beforeMethod
is triggered and creates a LocalSpan
and calls ContextManager.continued
with the ContextSnapshot
saved to the response object. The LocalSpan
is closed on the afterMethod
.HttpClientRequestImplDoHandleResponseInterceptor.beforeMethod
is triggered and creates a LocalSpan
and calls ContextManager.continued
with the ContextSnapshot
saved to the request object. The LocalSpan
is closed on the afterMethod
.I don't exactly have a question here so I hope you see what I'm doing wrong. The code I'm testing with is located here: https://github.com/apache/incubator-skywalking/compare/master...BFergerson:master
Essentially I followed your advice and made the process as such:
The processes you mentioned make sense to me, it should be processed in this way when you are facing async scenarios. I assume right now, the tracing and metric should be working well even in concurrency cases, right?
Just one suggestion, the local span you created, the operation name should be related to target framework behaviours, like a description, rather than which interceptor of SkyWalking plugin is doing this. Because in most cases, interceptor is just being known by plugin developer.
Of course, maybe this is just because, you are testing. :) If so, ignore me.
@wu-sheng, yeah those operation names are for testing right now. I've deleted code to try to keep what I'm showing you as straight forward as possible. I'll make sure to address that once everything is working.
And yes, your suggestions do fix the problem I was originally having with the traces becoming children of each other and never sending any to SkyWalking, but now I'm dealing with this:
As you can see, it doesn't correctly understand that /products/test
took 4 seconds to respond. It understands it when you open the trace up but on the endpoint dashboard it doesn't. What causes this discrepancy? Do the trace spans I'm producing in the JSON above seem correct?
As you can see, it doesn't correctly understand that /products/test took 4 seconds to respond. It understands it when you open the trace up but on the endpoint dashboard it doesn't. What causes this discrepancy? Do the trace spans I'm producing in the JSON above seem correct?
Actually, this is what I expect. Because the segment duration is shorter, the endpoint duration is based on that.
Also, if you want to show the segment perfectly, we need to build a new TracingContext implementation, with clear explicit declare to use it. In my previous experiences, that style of APIs is not very friendly to plugin developer, so I haven't implemented it. If you want to take part in more, I could start a draft and let us work on that.
@wu-sheng, I would definitely like to implement support for Vert.x so if it's necessary to create a new TracingContext implementation to do that then yes, please start a draft.
Because the segment duration is shorter, the endpoint duration is based on that.
Before you do, would you mind breaking down a bit why a new implementation is necessary? Specifically the logic quoted above. I understand a span to be a single "unit of work". From the SkyWalking documentation a segment is "a concept in SkyWalking, it should include all span for per request in a single OS process, usually single thread based on language". Based on these definitions, wouldn't everything involved with finishing the request for /products/test
be included in the segment? Meaning why not add logic which indicates that a segment is always equal to or longer than the duration of the spans which it consists of?
Sorry for don't make this clear to me.
a segment is "a concept in SkyWalking, it should include all span for per request in a single OS process, usually single thread based on language.
The Key in Java for plugin developer convenience, segment always works in a single thread at the same time. And this fits most of scenarios in Java codes. But in vertx(react style), things changed. Your case is very typical.
Thread accept the request, then
That is why if we don't finish that, the new incoming request uses the existing trace context, which triggers your issue at the beginning.
But we finish the span, then because tracing context is a stack-based structure, right? The tracing context finished, and segment finished too. So at this moment, the thread local is available for new request. That is why you traces look like working well in concurrency requests.
The side-effect of doing this, is segment finished when your work assignment is finished, not works are done because that is in another thread, right?
SkyWalking backend analysis the response time based on segment and entry span, that is why you can't see 4s(in your case) here, but you can see 4s in trace visualization. The trace visualization shows 4s because UI did that.
SkyWalking analysis 100% trust the entry span and segment, that is why we are high performance and don't need distributed cache in any case. So when request entry span says ~0s duration, the endpoint metric show in the same way.
I am proposal the complex tracing context implementation is
TracingContext
and Segment
could go across threads in JavaBut in this case, API of context is more complex, and lock mechanism must be involved.
@wu-sheng, I think I'm almost on the same page with you. You're saying the JSON trace stack I included above is correct in all aspects except for the endTime
on the ExitSpan
and EntrySpan
? And it's because of the fact that the EntrySpan
has an incorrect endTime
that the response time shows up as it does in the UI? And finally, this can't be done with the existing TracingContext
because it will finish the current Segment
and send that to the collector before the callback happens (so no possibility for modifying the endTime
).
TracingContext and Segment could go across threads in Java
If my understanding is correct then how does this proposal work? The TracingContext
would have already finished the Segment
even if I persist them till the time of the callback? Would there be additional logic in this TracingContext
implementation which prevented it from finishing a Segment
of just one EntrySpan
?
Thanks for all the help BTW. I'm learning a lot.
Yes. In my mind, for new tracing context mechanism(only optional), if you do capture, it will hold the segment untill continue happens. Only finish the whole segment and tracing context, if and only if all related threads are finished and no capture without continued.
Does this make sense?
@wu-sheng, I think I have a firm grasp of the problem now lol. Instead of modifying the endTime
why not just wait to close the span? I understand that waiting to close the span won't work with TracingContext
but that's because it deals with one segment at a time. A new TracingContext
implementation could work with multiple segments at a time requiring a key to choose which segment the spans belong to. I think with an implementation like this I could delay closing the EntrySpan
until the response is sent (how I was doing it before). That also lets the process for all the TracingContext
implementations remain essentially the same and this multiple segments/asynchronous one doesn't have extra look back logic.
Edit: Also, I could see how the same could be implemented with multiple TracingContext
instead of an implementation with multiple segments. I'm thinking something like AsyncContextManager
.
In theory yes, just delay to close the span, in practice way, it is hard. Because the span should still in stack-based structure, as a plugin developer, you can't tell whether there is one/more parent span(s) need to close by other plugins, right? So you must follow the stack order, to close the span. That mean, entry span must be finished first and reset finish time later.
Edit: Also, I could see how the same could be implemented with multiple TracingContext instead of an implementation with multiple segments. I'm thinking something like AsyncContextManager.
Are you saying to use AsyncContextManager replace ContextManager? If so, I think that will not work. There are plenty of plugins work out there using ContextManager as core API, you change that, means propagation context in the thread with other plugins will fail. I think AbstractTracerContext
implementation makes more sense.
@wu-sheng, okay the ContextManager
implementation might be too far. How is this not an issue for other async plugins you have though (eg: Apache httpcomponent AsyncClient 4.x)? Also, what kind of draft would you be starting?
Edit: You're probably going to say you don't start/close spans in the shared thread :P
Edit: You're probably going to say you don't start/close spans in the shared thread :P
Yes and there is a little different for them. Because they just do async call once for a time, and for their developer, they think, it just blocks short time, then the short duration makes sense. So they put callback in another thread and another segment/tracingcontext just like you did.
The ContextManager implementation is not hard, just it must count on continue
must be executed after you call capture
. Otherwise, the context would never finish and span in first context will be totally lost.
Does this make sense to you?
AcrossThreadContextTracing
should be the right term for this.
If it works, we could make asynchttp, grpc-streaming work in this way too. And in .net too.
cc @liuhaoyang @ascrutae @hanahmily I think you may be interested in this term and design too.
@wu-sheng, I guess I'm just not entirely sure where you see the difficulty in holding the EntrySpan
open until the response. Even in a completely asynchronous environment, the response will be the last event to occur. The EntrySpan
is opened, all the various callbacks involved are handled through LocalSpan
objects which add to the appropriate segment (stack-based all the same), and then when the response is finally ready to be written the EntrySpan
is retrieved, closed, and the segment is flushed to the collector.
Why would I need to know what the parent span is? Each span is dealt with individually. If I open a span I have to make sure to close it. If that happens then by the time the response is ready the active span will be the correct span to close.
Where is the difficulty that you're seeing that I'm not?
Why would I need to know what the parent span is? Each span is dealt with individually. If I open a span I have to make sure to close it. If that happens then by the time the response is ready the active span will be the correct span to close.
First of all, there isn't about you need to know the parent span. It is more about when tracing context could finish automatically and in the right time. In SkyWalking design and its API, you could notice that we don't have an API called TracingContext#finished
from any plugin, just inside TracingContext triggered automatically, WHY?, because no plugin knows he is the last/first span.
In codes level, read TracingContext#activeSpanStack
and EntrySpan
(also its parent StackBasedTracingSpan
), you will find out the span finish operation need to do in the exact same order as open it. Also TracingContext
is in the ThreadLocal
, which ones, one thread could only have one TracingContext.
Based on this background, how do you don't finish the entry span
but switch the TracingContext
to another request?
So, things like EntrySpan is retrieved
only could happen when tracing context doesn't finish even all spans finished(removed from the active stack). That is key I mean for new TracingContext
implementation, hold the context and span, make you have the chance to change/adjust the span.
Does this make more sense to you now?
This is a very complex scenario. Argue and the trade-off between development convenient and API capabilities. Most tracing systems ignore this to do based on Dapper paper says, so backend will face heavy payload. But they, like Zipkin and Jaeger, are not APM solution, so this doesn't matter for them. In another way, this design is common in APM vendor core, which they keep in the shadow :)
Another way to do a thing is by providing a new Entry/Exit Span
feature to support EntrySpan is retrieved
. Such as, Span#finishWithFurtherNotitify
, which remove it from the active span, but still hold the tracing context. Only when Span#finish
called(in another thread), tracing context finished and sent.
@wu-sheng, I think I understand. I assumed that the Vert.x EntrySpan
would be the root one. I wasn't thinking bigger picture. I agree about modifying the endTime
after the fact.
I will try to start a draft of my last proposal, provide span a feature for further explicit finish notification, which could be done in another thread. If I find other conflicts, I will inform you too.
PR submit. A draft version
@wu-sheng, using the new asynchronous functionality I'm able to get the following results:
This appears correct to me but I'm able to achieve these results without creating local spans and without using ContextSnapshot
. I simply call prepareForAsync
when the GET
request is sent and received and asyncFinish
when the response is sent and received. This produces the results above with the single exit span and single entry span.
Is that how you meant for it to be used? Previously you said that the callback was supposed to be a new local span. Is that still the case when using the asynchronous functionality? If so, what is the order of events I should be taking? First open the local span, then close the entry span via asyncFinish
, then close the local span?
Is that still the case when using the asynchronous functionality? If so, what is the order of events I should be taking? First open the local span, then close the entry span via asyncFinish then close the local span?
From the trace, you should be using APIs in the right way. The other steps are about, you should create new context(including local span) in another thread(callback) in case there is other further plugin working.
Such as in your codes
private void callGetProduct() {
HttpClient httpClient = vertx.createHttpClient();
httpClient.getNow(8080, "localhost", "/products/test",
new Handler<HttpClientResponse>() {
@Override
public void handle(HttpClientResponse httpClientResponse) {
/////////////////////////////
// Add more real codes here,
// including call another RPC or access Database
/////////////////////////////
}
You would like the trace could continue in #handle
method, right? If you don't create new context in handle thread, the trace will break, because async span doesn't include the capability to propagate context. Async span is just for making sure duration/tag/log/etc. fields right/accurate.
I am glad we are almost done:)
@wu-sheng, everything makes sense when you say it :P. I've added the local span back. I think the last question I have before I repeat this pattern for other Vert.x classes is:
ContextManager.continued
and asyncFinish
make any difference?I've tried both and they both work and seem to produce the same traces.
The way I have it currently coded for the callback method is:
@Override
public void beforeMethod(EnhancedInstance objInst, Method method, Object[] allArguments, Class<?>[] argumentsTypes,
MethodInterceptResult result) throws Throwable {
VertxContext context = (VertxContext) objInst.getSkyWalkingDynamicField();
ContextManager.createLocalSpan(context.getContextSnapshot().getParentOperationName());
ContextManager.continued(context.getContextSnapshot());
context.getSpan().asyncFinish();
}
@Override
public Object afterMethod(EnhancedInstance objInst, Method method, Object[] allArguments, Class<?>[] argumentsTypes,
Object ret) throws Throwable {
ContextManager.stopSpan();
return ret;
}
The above code is the interceptor for the method:
@Override
public void handle(HttpClientResponse httpClientResponse) {
/////////////////////////////
// Add more real codes here,
// including call another RPC or access Database
/////////////////////////////
}
does the order of ContextManager.continued and asyncFinish make any difference?
Nothing related.
PR merged. When you are ready, you could submit the vert.x PR :)
there is work for tracing done in Vert.x for 4.0 , it is described in this spec: https://github.com/vert-x3/wiki/wiki/RFC:-context-tracing
also there are two implementations with OpenTracing and Zipkin here that can be used as examples https://github.com/vietj/vertx-tracing
This is an auto instrumentation plugin for skywalking native.
@vietj, the new tracing features in Vert.x 4.0 look easy and simple to use. As far as I understand though, there isn't a simple way to do it in Vert.x 3.x and that's the version we're adding native SkyWalking tracing support to.
SkyWalking already supports OpenTracing so it sounds like Vert.x 4.0 will be automatically supported without a native plugin, so that's awesome.
@BFergerson I wouldn't expect OpenTracing works very well. Sadly to say so. But we could take advances of vertx core to make auto instrument better and easier.
@wu-sheng, could you explain that please? OpenTracing support is mentioned on this project several times. It's my understanding that if you support OpenTracing then you support anything that uses OpenTracing compatible tracers?
vietj also mentions that they will have Zipkin support for Vert.x 4.0. Even with OpenTracing and Zipkin native support for Vert.x we would still need an auto-instrumentation plugin?
ould you explain that please? OpenTracing support is mentioned on this project several times. It's my understanding that if you support OpenTracing then you support anything that uses OpenTracing compatible tracers?
That is the typical issue of OpenTracing. OpenTracing is just a set of API, it is no guarantee. And for SkyWalking, it is just a bridge API to agent core, which means, sync call most likely works, but requires user manually active SkyWalking tracer, based on OT API, but in the real world, most users would not because we have the agent, people want really auto, right?
For vertx, as an async lib, I really doubt OT's compatible issue, especially when it works with agent core.
vietj also mentions that they will have Zipkin support for Vert.x 4.0. Even with OpenTracing and Zipkin native support for Vert.x we would still need an auto-instrumentation plugin?
If we just want Zipkin trace, and work with other zipkin plugin, then you don't need auto-instrument agent. If you want SkyWalking format and work with other our auto-instrument plugins, then YES, we still need the plugin.
Zipkin trace and format supported in SkyWalking OAP, but only when all traces collected by Zipkin lib. We can't support hybrid.
Basically, the auto-instrument magic means, in almost all cases, plugins are required. In SkyWalking most known users, they only use OT-SkyWalking API to set some tag and local span.
I mostly work around auto-instrument, especially in Java. @adriancole knows much more about the OpenTracing issues in manual instrumentation library compatible. It is far from your expected right now.
Two more things.
You see, there are a lot of things happening, and I and most SkyWalking users/contributors want 100% stable and reliable, we recommend our native format and plugins always. We are tracing and monitoring a lot of systems, we can't break them.
@wu-sheng you can try to ignore the opentracing thing and look at the other one in vert.x 4. I think @vietj did bake in some OT semantics.. some are easy to fix. however OT and brave apis are similar anyway. it might be more difficult to take something not largely a clone of brave and integrate it https://github.com/vietj/vertx-tracing/pull/1/files#r274201967
We can close this now with https://github.com/apache/incubator-skywalking/pull/2386 being accepted. This gives Skywalking basic support for Vert.x. As I continue to monitor my Vert.x applications with Skywalking I'll expand on this support. Thanks for all the help @wu-sheng. Looking forward to the release of 6.1 ;).
@BFergerson Thanks. Welcome to contribute other plugins. The 6.1 release should happen next month.
Please answer these questions before submitting your issue.
Question
I am creating a plugin for supporting Vert.x powered applications using SkyWalking. Everything appears to work when I spread out the events so they each happen one after the other but once I start letting the messages flow naturally SkyWalking won't finish a single span.
For example:
/products/test
(this creates an ExitSpan)/products/test
(this creates an EntrySpan)/products/test
(this closes the EntrySpan)/products/test
(this closes the ExitSpan)The above works correctly and the SkyWalking UI appears to agree. It's when the flow of events changes that SkyWalking will not finish a trace.
For example:
/products/test
(this creates an ExitSpan)/products/test
(this creates an ExitSpan, child of previous)/products/test
(this creates an EntrySpan, child of previous)/products/test
(this creates an EntrySpan, child of previous)/products/test
(this closes the EntrySpan, child of previous)/products/test
(this closes the EntrySpan, child of previous)/products/test
(this closes the ExitSpan, child of previous)/products/test
(this closes the ExitSpan)In the above example, SkyWalking would eventually finish a trace but it would be the single trace instead of the expected two traces. The reason I say that SkyWalking never finishes a trace is that I have the above occurring every second and since SkyWalking keeps adding spans as the child of the previous it keeps a non-empty span stack and continues without finishing any traces.
From my best understanding of what's happening, the
TracingContext
class requires either of two scenarios to create a new trace:TracingContext.activeSpanStack
is emptyIf that's correct then I don't see how I can get this to work with Vert.x. In this case, Vert.x is using the same thread to run all of the events on. Both of the GET requests, both of the GET receives, and both of the GET responses happen on the same thread and I can't simply use the
InstanceMethodsAroundInterceptor.afterMethod
method to close spans because of the asynchronous nature of Vert.x.If my understanding is correct what would you guys recommend to tackle this issue?
Which version of SkyWalking, OS and JRE?
SkyWalking 6.0.0 GA, Ubuntu 18.04, Java 8