Open daravep opened 2 years ago
I spun off #14258 for the Application Runtime team. Please keep in mind that we need to prioritize according to our available resources.
@stefanobaghino-da Thanks. I opened this based on a request by a client in order to capture the current state, not to suggest immediate action. Please discuss the priority of this with the product manager in charge. Oh, that's you! ;-)
@daravep I am working on #12798 where I will forward the trace_id from json API to ledger API. As I understand there is no doc at the moment, may I ask if you can point me to the code that is related to passing trace context in grpc? I would like to understand how to pass in trace context when interacting with the ledger. Thanks
@chunlokling-da Thanks for digging into this. Please don't forget to make your conclusions available to others too.
I've left the pointer that I know in the original description of this issue: https://github.com/DACH-NY/canton/blob/main/community/participant/src/main/scala/com/digitalasset/canton/participant/ledger/api/client/CommandSubmitterWithRetry.scala#L75
You can also observe that it happens here: https://github.com/digital-asset/daml/blob/main/ledger/ledger-api-client/src/main/scala/com/digitalasset/ledger/client/services/commands/CommandSubmissionFlow.scala#L25
On the Canton side, @danilofaria did all the work to get this working. He understands it much better, so I would suggest you talk to him (or use git blame to figure out person who wrote the command submission flow ...)
@daravep Thank you so much for the Links!
I had a look at the code here: https://github.com/digital-asset/daml/tree/4064e992f369f7f8e52dfd8803388d7fa56bb6f9/ledger/participant-integration-api/src/main/scala/platform/apiserver/services
I can only find there is a telemetryContext: TelemetryContext
in the ApiSubmissionService but not in other services (eg ApiCommandService.scala) Does it mean that only ApiSubmissionService has the trace context support?
I think so. @mziolekda might know better how far tracing is supported in the ledger API server.
Yes, that is correct, only a bunch of services support it:
To cover additional services would take a mid size project. The problem with the entire tracing solution is that it is a bit ad hoc at the moment. Bits of it have been implemented to suit a particular immediate testing need of a particular DA team. It has not been addressed comprehensively from the client perspective. I feel like that would be required here. Only then would we avoid the trap of implementing it partially or omitting some components all-together for example the json-api etc.
Yes, that is correct, only a bunch of services support it:
- CommandSubmissionService
- ConfigManagementService
- PackageManagementService
- PartyManagementService
To cover additional services would take a mid size project. The problem with the entire tracing solution is that it is a bit ad hoc at the moment. Bits of it have been implemented to suit a particular immediate testing need of a particular DA team. It has not been addressed comprehensively from the client perspective. I feel like that would be required here. Only then would we avoid the trap of implementing it partially or omitting some components all-together for example the json-api etc.
Just to confirm, based on your list: do the CommandCompletionService
and the CommandService
not support tracing?
@mziolekda Ping on the comment above since it blocks #12798.
@skisel-da confirmed that neither service supports tracing right now. #12798 remains blocked for the time being.
@mziolekda How do you prefer to keep track of the work that would be required to expand the tracing capabilities of the Ledger API server? This issue has just been created about it.
I have created an epic for it on our backlog: https://digitalasset.atlassian.net/browse/DPP-1275 It is not planned for Q4. If you feel it should be addressed urgently, a priority call must be made by Bernhard and Ratko.
Just to add my two cents here, I am currently improving documentation around tracing in Canton and describing how we trace a Canton ping.
The ping is a series of 3 commands:
Ping
contract Ping
contract by exercising the Respond
consuming choice, which results in the creation of a Pong
contractPong
contract by exercising the Ack
consuming choice on it, which finalizes the process.Ideally, we would like the 3 steps to be part of the same trace but even though command submission does support passing a telemetryContext, the transaction object we get from the transaction source has no telemetry information propagated. We end up with the traces broken and with part of the ping trace missing.
I have to add a caveat in the Canton documentation that this is a current limitation.
Although this is not incredibly urgent, it would be great to get it fixed.
FYI @mziolekda @gerolf-da @stefanobaghino-da
Canton deployments support trace context propagation: https://docs.daml.com/canton/usermanual/monitoring.html#tracing
As a result, every command will be attached to a trace-id that is propagated between all distributed components to simplify debugging.
In fact, the trace contexts can even be generated in applications and propagated through GRPC, the Ledger API server into the Canton synchronization process.
Also, the not-officially-supported scala ledger bindings support context propagation, as used by Canton internally: https://github.com/digital-asset/canton/blob/main/community/common/src/main/scala/com/digitalasset/canton/ledger/api/client/CommandSubmitterWithRetry.scala#L75
However, there are several gaps:
The current work-around is to find a "rosetta stone like" log message that mentions the "command-id" as well as a trace-id and then to filter for both. However, that only works for command submissions, not for e.g. party allocations and is sub-optimal.