Sifchain / sifnode

SifNode - The future of Defi
Other
108 stars 117 forks source link

Add APM tracing to sifnoded #2118

Open brianosaurus opened 3 years ago

brianosaurus commented 3 years ago

So that we can enable tracing and find node halts and other sifnoded problems:

https://docs.datadoghq.com/tracing/setup_overview/setup/go/?tab=containers

The goal is to help chainops diagnose problems by increasing observability.

@intl-man requested this on 10/31/2012.

jon-sif commented 3 years ago

@brianosaurus what do you need me to do here? just say its ok to integrate the datadog agent?

AustinoBombino commented 3 years ago

Old work on tracing can be found in this branch: https://github.com/Sifchain/sifnode/tree/feature/tracing-test

A lot have changes have come through since this work was done so I would only use it for reference and start experimenting on a new branch.

The branch only has basic setup that succussfully reported test data to datadog from within sifnode.

marshsif commented 2 years ago

Jordan @pandaring2you Jon asked me to reassign this to you, as a ticket that Brent will most likely be taking on for this sprint, to address debugging/tracing improvements for our nodes. The purpose is to help us better identify the cause of nodes crashing on BetaNet or other env's.

Brian made a little headway a while back, but he had to prioritize .43 and now he's on PMTP.

pandaring2you commented 2 years ago

moving back to Sifnode, devops has done what we can (IBC query issues have been resolved). Remaining work - Adding APM tracing probably belongs to Sifnode

marshsif commented 2 years ago

ok, noted. Whoever works on this may need to sync with Austin to understand what remains to be done.

begmaroman commented 2 years ago

Infra setup For nodes that are running inside k8s cluster, the following setup should be applied - https://docs.datadoghq.com/agent/kubernetes/apm/?tab=helm. Here is an article how to configure the datadog for APM - https://docs.datadoghq.com/tracing/setup_overview/setup/go/?tab=containers#configure-the-datadog-agent-for-apm

Code setup Based on datadog's examples and explanations, the mode advanced way to integrate the datadog tracing mechanism is to wrap all needed interfaces/types with the datadog tracing logic, and add them to the contrib directory there https://github.com/DataDog/dd-trace-go/tree/master/contrib. After that we can import a needed package and use that in the sifnode app. Each wrapper should include all needed information into trace spans so it could be sent to the agent.

If we want to see metrics such as mem usage, cpu usage, etc., we can just add a logic from the code there https://docs.datadoghq.com/tracing/setup_overview/setup/go/?tab=containers#configure-the-datadog-agent-for-apm

intl-man commented 2 years ago

@begmaroman - thanks for the summary. We're aware of the infra requirements and as mentioned in slack, it might pay to engage with the Cosmos community to look at how others are doing this (just a suggestion, that's all).