arcus-azure / arcus-service-to-service-correlation-poc

POC to have end-to-end correlation stitching operations across services together in the Azure Application Insights Application Map.
MIT License
0 stars 3 forks source link

What should be the final outcome of service-to-service correlation tracking #7

Open fgheysels opened 2 years ago

fgheysels commented 2 years ago

When running the PoC, we currently have this end-to-end transaction overview:

image

In the PoC, we're already close to the desired end-result, but we're just not there yet.

In the above image, we can distinguish 2 major blocks:

Now, I think there are still some issues in this visual representation:

So, I think the entries in the brown box should be displayed like this:

v  localhost:787 POST api/v1/market                   (request)
    arcus.api.bacon:999 GET /api/v1/bacon             (dependency)
    v arcus.api.bacon:999 GET /api/v1/bacon           (request)
       example-server SQL:bacon-db/flavors            (dependency)
    arcus-s2s-bus  orders                             (dependency)

Also here, there are some issues with the representation imho:

I think, the dependencies & requests in the green box should be visualized like this:

        v  Order Worker Process           (request)
           arcus.api.bacon:999 GET /api/v1/bacon  (dependency)
           v  arcus.api.bacon:999 GET /api/v1/bacon (request)
                example-server SQL:bacondb/flavors (dependency)

At least, this is my opinion and what I would expect. What are your opinions @stijnmoreels @tomkerkhove @gverstraete ?

fgheysels commented 2 years ago

The outcome in this issue has been achieved in the service-to-service poc, with the changes that have been made in this PR.

fgheysels commented 2 years ago

I just don't know how we're going to make sure that the visualization in the Application Map also displays that the Order Worker retrieves messages from arcus-s2s-bus.

image

stijnmoreels commented 2 years ago

I just don't know how we're going to make sure that the visualization in the Application Map also displays that the Order Worker retrieves messages from arcus-s2s-bus.

image

That is to be expected, as said: the order worker needs some minutes before every call is made and tracked. This can take up to 5/10 min even. Or maybe more.

stijnmoreels commented 2 years ago

The outcome in this issue has been achieved in the service-to-service poc, with the changes that have been made in this PR.

You mean that this is already done? That the proposed indentation of dependencies based on requrests is already made possible with these changes? I would say, no, 😄 .

I think it's a valuable input on this very complex topic. I can definitely see the added-value if the entire flow is made like this. In one fluent motion. I was wondering if this is also how it's 'intended' to be. As in: is this also the result when people use the TelemetryClient instead of logging. 🤔 If not, than I think we're even better 😄 .

I think there are also some guidelines/requirements on correlation. Is this something we should take into account in this? HTTP correlation especially may need some work.

fgheysels commented 2 years ago

You mean that this is already done? That the proposed indentation of dependencies based on requrests is already made possible with these changes? I would say, no, 😄 .

No, I mean that the screenshots that are displayed here, are done via the modifications made in that PR :)

stijnmoreels commented 2 years ago

You mean that this is already done? That the proposed indentation of dependencies based on requrests is already made possible with these changes? I would say, no, 😄 .

No, I mean that the screenshots that are displayed here, are done via the modifications made in that PR :)

Ah, ok, and is the order worker still referenced? I assume it would as it's not related to that.

fgheysels commented 2 years ago

I'd expect that the dependency between the Order worker & Service Bus is visible in the Application map, as the Order worker logs a ServiceBus Request when it handles a message, but that is currently not the case.

stijnmoreels commented 2 years ago

I'd expect that the dependency between the Order worker & Service Bus is visible in the Application map, as the Order worker logs a ServiceBus Request when it handles a message, but that is currently not the case.

😕 Hmm, remember running the poc and seeing the connection with the order worker, though, it is the one reference that takes the longest time to come up (we're talking minutes here).

stijnmoreels commented 2 years ago

@fgheysels , can you give an update on this according to the updates made on the dependent libraries?

fgheysels commented 2 years ago

This is how it looks like now: image

I would assume that:

stijnmoreels commented 2 years ago
  1. also think it is strange that the duration of the POST market operation includes the duration that the message has spent in the queue.

The POST market request isn't done until the Worker and other API is done calling, so it's seems logical that this is included, no?

  1. the dependency to the bacon API should indent as opposed to the request to the market API. When I click on that dependency, I can see that the parentId is set to 000000000, It is also not clear that the POST market operation has a dependency on ServiceBus. Also here, I believe the ParentId must be set correctly, but here, it is also set to '0000000000000000'

This is indeed not implemented. We now only made sure that the request is linked to the dependency call, but not the way around. It has also something to do with the initial POST market request, if that one doesn't specify a operation parent ID, there's nothing to link.

fgheysels commented 2 years ago
  1. also think it is strange that the duration of the POST market operation includes the duration that the message has spent in the queue.

The POST market request isn't done until the Worker and other API is done calling, so it's seems logical that this is included, no?

That is not correct. The POST operation is done as soon as the message has been delivered to the queue by the API operation. The Worker that processes the messages on the queue, is another process. In theory, it is possible that the worker only picks up and processes the message after several hours or even days. The POST market operation ends as soon as the API operation has returned a statuscode.

fgheysels commented 2 years ago
  1. the dependency to the bacon API should indent as opposed to the request to the market API. When I click on that dependency, I can see that the parentId is set to 000000000, It is also not clear that the POST market operation has a dependency on ServiceBus. Also here, I believe the ParentId must be set correctly, but here, it is also set to '0000000000000000'

This is indeed not implemented. We now only made sure that the request is linked to the dependency call, but not the way around. It has also something to do with the initial POST market request, if that one doesn't specify a operation parent ID, there's nothing to link.

But how is this then done in other libraries like Entity Framework ? Every trace has an id, even if you do not explicitly assign one. Maybe they're making use of the Activity class, idk.

stijnmoreels commented 2 years ago

Hmm, I think it is implemented, but we need to specify it in the operation ID header. That's also why we have the ExtractFromRequest option in the HTTP correlation middleware component.

We could, however, generate the operation parent ID if the result of that setting doesn't result in an operation parent ID, but we would have to check if we are allowed to do that within the HTTP correlation protocol.

stijnmoreels commented 2 years ago
  1. also think it is strange that the duration of the POST market operation includes the duration that the message has spent in the queue.

That is not correct. The POST operation is done as soon as the message has been delivered to the queue by the API operation. The Worker that processes the messages on the queue, is another process. In theory, it is possible that the worker only picks up and processes the message after several hours or even days. The POST market operation ends as soon as the API operation has returned a statuscode.

Aha! Maybe this is a result of the service-to-service correlation, as the Duration column clearly states the duration but that doesn't corresponds with the graphic that is shown. So, this is probably the 'entire operation duration'. So, it's correct, no?