digital-asset / daml

The Daml smart contract language
https://www.digitalasset.com/developers
797 stars 199 forks source link

Improve public accessibility to our trace tooling #14256

Open daravep opened 2 years ago

daravep commented 2 years ago

Canton deployments support trace context propagation: https://docs.daml.com/canton/usermanual/monitoring.html#tracing

As a result, every command will be attached to a trace-id that is propagated between all distributed components to simplify debugging.

In fact, the trace contexts can even be generated in applications and propagated through GRPC, the Ledger API server into the Canton synchronization process.

Also, the not-officially-supported scala ledger bindings support context propagation, as used by Canton internally: https://github.com/digital-asset/canton/blob/main/community/common/src/main/scala/com/digitalasset/canton/ledger/api/client/CommandSubmitterWithRetry.scala#L75

However, there are several gaps:

The current work-around is to find a "rosetta stone like" log message that mentions the "command-id" as well as a trace-id and then to filter for both. However, that only works for command submissions, not for e.g. party allocations and is sub-optimal.

stefanobaghino-da commented 2 years ago

I spun off #14258 for the Application Runtime team. Please keep in mind that we need to prioritize according to our available resources.

daravep commented 2 years ago

@stefanobaghino-da Thanks. I opened this based on a request by a client in order to capture the current state, not to suggest immediate action. Please discuss the priority of this with the product manager in charge. Oh, that's you! ;-)

chunlokling-da commented 2 years ago

@daravep I am working on #12798 where I will forward the trace_id from json API to ledger API. As I understand there is no doc at the moment, may I ask if you can point me to the code that is related to passing trace context in grpc? I would like to understand how to pass in trace context when interacting with the ledger. Thanks

daravep commented 2 years ago

@chunlokling-da Thanks for digging into this. Please don't forget to make your conclusions available to others too.

I've left the pointer that I know in the original description of this issue: https://github.com/DACH-NY/canton/blob/main/community/participant/src/main/scala/com/digitalasset/canton/participant/ledger/api/client/CommandSubmitterWithRetry.scala#L75

You can also observe that it happens here: https://github.com/digital-asset/daml/blob/main/ledger/ledger-api-client/src/main/scala/com/digitalasset/ledger/client/services/commands/CommandSubmissionFlow.scala#L25

On the Canton side, @danilofaria did all the work to get this working. He understands it much better, so I would suggest you talk to him (or use git blame to figure out person who wrote the command submission flow ...)

chunlokling-da commented 2 years ago

@daravep Thank you so much for the Links!

I had a look at the code here: https://github.com/digital-asset/daml/tree/4064e992f369f7f8e52dfd8803388d7fa56bb6f9/ledger/participant-integration-api/src/main/scala/platform/apiserver/services

I can only find there is a telemetryContext: TelemetryContext in the ApiSubmissionService but not in other services (eg ApiCommandService.scala) Does it mean that only ApiSubmissionService has the trace context support?

ApiSubmissionService.scala

daravep commented 2 years ago

I think so. @mziolekda might know better how far tracing is supported in the ledger API server.

mziolekda commented 1 year ago

Yes, that is correct, only a bunch of services support it:

To cover additional services would take a mid size project. The problem with the entire tracing solution is that it is a bit ad hoc at the moment. Bits of it have been implemented to suit a particular immediate testing need of a particular DA team. It has not been addressed comprehensively from the client perspective. I feel like that would be required here. Only then would we avoid the trap of implementing it partially or omitting some components all-together for example the json-api etc.

stefanobaghino-da commented 1 year ago

Yes, that is correct, only a bunch of services support it:

  • CommandSubmissionService
  • ConfigManagementService
  • PackageManagementService
  • PartyManagementService

To cover additional services would take a mid size project. The problem with the entire tracing solution is that it is a bit ad hoc at the moment. Bits of it have been implemented to suit a particular immediate testing need of a particular DA team. It has not been addressed comprehensively from the client perspective. I feel like that would be required here. Only then would we avoid the trap of implementing it partially or omitting some components all-together for example the json-api etc.

Just to confirm, based on your list: do the CommandCompletionService and the CommandService not support tracing?

stefanobaghino-da commented 1 year ago

@mziolekda Ping on the comment above since it blocks #12798.

stefanobaghino-da commented 1 year ago

@skisel-da confirmed that neither service supports tracing right now. #12798 remains blocked for the time being.

stefanobaghino-da commented 1 year ago

@mziolekda How do you prefer to keep track of the work that would be required to expand the tracing capabilities of the Ledger API server? This issue has just been created about it.

mziolekda commented 1 year ago

I have created an epic for it on our backlog: https://digitalasset.atlassian.net/browse/DPP-1275 It is not planned for Q4. If you feel it should be addressed urgently, a priority call must be made by Bernhard and Ratko.

danilofaria commented 1 year ago

Just to add my two cents here, I am currently improving documentation around tracing in Canton and describing how we trace a Canton ping.

The ping is a series of 3 commands:

Ideally, we would like the 3 steps to be part of the same trace but even though command submission does support passing a telemetryContext, the transaction object we get from the transaction source has no telemetry information propagated. We end up with the traces broken and with part of the ping trace missing.

I have to add a caveat in the Canton documentation that this is a current limitation.

Although this is not incredibly urgent, it would be great to get it fixed.

FYI @mziolekda @gerolf-da @stefanobaghino-da