[DiagnosticsSource] http correlation protocol should optionally support intrinsic sort

jacpull commented 7 years ago

Related to issue dotnet/runtime#22777

The current version of the Hierarchical Request Format mandates uniqueness. But analytics on tracing formats which are vector clock based, also rely on the format providing a sort without depending on other event properties such as Time. To provide such a sort as well as provide uniqueness, the elements in the vector have to support a monotonic sequence.

The current protocol is cited as follows:

Suffix MUST be unique for every outgoing HTTP request sent while processing the incoming request; monotonically incremented number of outgoing request within the scope of this incoming operation, is a good candidate.

But there is insufficient information if the value is actually a monotonic sequence. Also, a benefit of explicitly supporting a numeric format incremented from 0 is improved wire cost when compared to a randomly generated suffix.

The proposed changes are follows:

Distinguish vector elements to be either a number (on which a sort is possible) or an entropy element, using a prefix character: "#". Number can be typed to unsigned 64 bit max.
Provide a way to enable "atomic increments" mode in the built-in .NET implementation. If enabled, this would create atomic increments of the current vector element on the outgoing path, as opposed to simply generating randomized suffixes and the prefix character described above would be applied on the latter case.
If the "atomic increments" mode is turned on for a component that is initializing a root request id, it should generate X.0.
Similarly, if the "atomic increments" mode is turned on, and if the incoming value of the Request Id is "X.", then the value should be "X.0" (if the "trusted caller" mode is on as per dotnet/runtime#22777) or "X.[Entropy_0]" otherwise.

@vancem, @lmolkova, @SergeyKanzhelev

lmolkova commented 7 years ago

But there is insufficient information if the value is actually a monotonic sequence.

The standard does not require monotonic sequence, just provides guidance.

The .NET implementation uses

a monotonically incremented within the instance (and randomly seeded at startup) number for the Activities that have an external parent (such as incoming request). In 'atomic increment' mode we would always append '.0'
a monotonically incremented number starting from 1 for Activities that have an internal parent (such as outgoing requests). We can easily change it to start from 0 if needed.

As atomic increment could not be done across the cluster for the incoming request, in this mode we would lose uniqueness in certain scenarios (#22284).

The current implementation allows generating unique ids in these scenarios and makes Request-Id as sortable as it could be in the 'atomic increment mode': It is possible to sort traces for the same operation, but different operations could not be sorted by id instead of timestamp.

Is my understanding correct that 'atomic increment' mode is needed to make Id generation compatible with correlationVector?

jacpull commented 7 years ago

Yes - to make Id compatible with cV, we would need atomic increment mode. But this is on the outgoing path. If we have atomic increment mode on the outgoing path of a caller, then we don't really need to add entropy suffix on the incoming path of the corresponding target, in order to preserve uniqueness.

So typically for a caller/callee, if the caller is in atomic increment mode, the callee can be in "trusted caller" mode and this is basically the cV. If the caller cannot be trusted or in other situations called out in dotnet/runtime#22777, then the callee should not be in the "trusted caller" mode (we should rename this mode - as trusted caller is not the only scenario). If the callee is misconfigured, then the uniqueness rule will be violated.

On the incoming path, the cV simply extends the vector: X => X.0.

The current cV's vector protocol will be compatible with Request Id only if both are enabled:

atomic increment on the outgoing
trusted caller on the incoming

We can rev the cV to support the entropy suffixes on both incoming and outgoing in the next version for full alignment.

jacpull commented 7 years ago

To elaborate with an example, assume the incoming value to a component is "X.", and the component makes an outgoing call in the same context, then the Request ID on the outgoing value as per current spec will be "X.AB." where "A" is a randomly generated suffix that was appended on the incoming path and "B" is a randomly generated suffix emitted on the outgoing path.

The proposed changes will instead behave as follows: a. Trusted_Caller -and Atomic_Increment: X.”{0-9}+”. b. Trusted_Caller -and -not Atomic_Increment: X.”#{Base64}[8]”. c. -not Trusted_Caller -and AtomicIncrement: X.”{Base64}[8]””{0-9}+”. d. -not Trusted_Caller -and -not AtomicIncrement: X.”{Base64}[8]””#{Base64}[8]”.

This way, case a is a compliant cV, while case d should be identical to the current Request Id except for the prefix character "#" that distinguishes the subsequent value as a randomly generated value. We could also do something like this: X.”{Base64}[8]””{Base64}[8]”. i.e. we could add a tailing "_" instead of a preceding "#" to convey the same semantics.

vancem commented 7 years ago

First, let me say that I am VERY supportive of making the ID sortable. We should definitely make that happen.

Second, I have a VERY STRONG preference for a 'modeless design'. Only if we try HARD and fail should we be even considering have modes.

Third, lets first talk about our current IMPLEMENTATION, once we have a good decision about how to do that, writing up the spec language (which mostly just provided wiggle room when possible), should be easy. The code for this is in [GenerateID](https://github.com/dotnet/corefx/blob/master/src/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/Activity.cs#L358

Forth: we are VERY CLOSE today. As @lmolkova indicates we already do atomic increment when generating child IDs for cases where the parent is internal to a machine. In the case where the ID comes from off the host, we append a 64 bit number that is incremented and is randomly seeded with a 32 bit value (thus it will almost always be a 32 bit or 33 bit number). Note that things are frankly already 'good enough' because it is ALREADY the case that on outgoing HTTP we create an normal, atomic increment ID for the outgoing request. Thus we expect ONLY ONE such request and baring redirects, there will be only one child of that outgoing request (the incoming request, that generates the new (large) sub ID. Because there is only one child, and it is a number, IT SORTS PROPERLY.

Thus it is not clear we even have a semantic issue here. Our current IDs should be sortable today. The only possible issue is that the IDs are bigger than you might want, but that is probably not a huge deal.

Also note that we DO distinguish between 'internal' child IDs and these 'incoming' child IDs. The former are suffixed with a '.' and the later with a '_'. So that capability is already present (for example you could ignore these IDs from a sort perspective, but I must admit I don't really see the point of doing that)

So in particular, going through the proposed changes:

Distinguish vector elements to be either a number (on which a sort is possible) or an entropy element, using a prefix character: "#". Number can be typed to unsigned 64 bit max.

We already do this (with a suffix rather than a prefix).

Provide a way to enable "atomic increments" mode in the built-in .NET implementation. If enabled, this would create atomic increments of the current vector element on the outgoing path, as opposed to simply generating randomized suffixes and the prefix character described above would be applied on the latter case.

As mentioned I don't want modes, but I think we already do this all the time. In particular we do NOT add anything on the OUTGOING path. We ALREADY give each outgoing HTTP request its own ID (generated with atomic increments, so it is nicely sortable) It is only on the INCOMMING path that we add things to try to make things unique. That is if the user is doing activity X. then if that activity sends and HTTP request that ID in the header will be X.1 (we happen to start are new children at 1, but it would be trivial make them start at 0).

If the "atomic increments" mode is turned on for a component that is initializing a root request id, it should generate X.0.

Our root IDs look like |3C39A343-34ABC343423232432. Where the number after the - is a random 64 bit number representing the machine/process, and the first number is a 64 bit number (initialized to a random 32 bit number) that represents the 'next' Root ID in that process. (thus it increments). Note that this ID was specifically generated 'backwards' (machine ID after request ID), so that loggers that sample based on poor hashing (sadly surprisingly common), always see the first bits (which are the ones that change rapidly).

Thus we don't generate a .0, but that seem like something that is an unnecessary detail (in particular we sort (you could even sort so that all request from the same 'machine' are together but you would have to 'know' to reverse the components of the root ID).

Similarly, if the "atomic increments" mode is turned on, and if the incoming value of the Request Id is "X.", then the value should be "X.0" (if the "trusted caller" mode is on as per dotnet/runtime#22777) or "X.[Entropy_0]" otherwise.

Today, every incoming request X. would be given X.Y_ to insure it is unique. But as mentioned X is the ID OF A SINGLE HTTP request, and so under almost all conditions it has only the one child Y. Thus sorting is irrelevant.

So it is not clear we have to do anything. We seem to already be sortable. (see also the discussion in dotnet/runtime#22777

jacpull commented 7 years ago

Thanks for the pointer to the code and the explanation. Couple of points:

As currently implemented in .NET, the generation of 'internal' child ids is very much aligned to the cV's implementation and protocol on the outgoing path.
One key issue though is that the language in the current specification for the hierarchical request id does not make the atomic increment mandatory. The current language is:

Suffix MUST be unique for every outgoing HTTP request sent while processing the incoming request; monotonically incremented number of outgoing request within the scope of this incoming operation, is a good candidate.

This can cause an issue where alternative implementations may choose to maintain uniqueness, but may choose to randomly generate the suffix. So can we consider updating the language in the spec to insist on the monotonicity? If not, then would we consider a mode/switch?

In the case of the incoming child IDs, the ask was to consider the generation of the suffix optional. The presence of the suffix does not impact the sort and the plan is to rev the cV format to align with this behavior. So we can discuss this further in dotnet/runtime#22777.

vancem commented 7 years ago

One key issue though is that the language in the current specification for the hierarchical request id does not make the atomic increment mandatory.

I think changing the language to make it monotonic is reasonable. I will propose a PR.

Also making the suffix optional is fine as well, I will add that to the PR.

dotnet-policy-service[bot] commented 1 month ago

Due to lack of recent activity, this issue has been marked as a candidate for backlog cleanup. It will be closed if no further activity occurs within 14 more days. Any new comment (by anyone, not necessarily the author) will undo this process.

This process is part of our issue cleanup automation.

dotnet-policy-service[bot] commented 2 weeks ago

This issue will now be closed since it had been marked no-recent-activity but received no further activity in the past 14 days. It is still possible to reopen or comment on the issue, but please note that the issue will be locked if it remains inactive for another 30 days.

dotnet / runtime

[DiagnosticsSource] http correlation protocol should optionally support intrinsic sort #22794