Enabling tracing by usage of opentelemetry

rem1-dev commented 3 months ago

I've added tracing to the project by introducing mainly telemetry, opentelemetry and Tokyo crates. Having tracing allows analysis of bottleneck functions (functions that take majority of execution time).

RCasatta commented 3 months ago

Isn't prometheus trying to achieve the same thing? Should we remove that if we decide to use opentelemetry?

philippem commented 3 months ago

Isn't prometheus trying to achieve the same thing? Should we remove that if we decide to use opentelemetry?

prometheus gives us metrics ( example there is a histogram of times spent making RPC calls to bitcoind). Many alerts can be directly built on prometheus metrics.
opentelemtry (OTLP) gives us tracing within and between processes (for example we can look at a mempool update operation and see where the time is being spent)

They both enhance observability but are complementary.

shesek commented 2 months ago

Isn't prometheus trying to achieve the same thing? Should we remove that if we decide to use opentelemetry?

... They both enhance observability but are complementary.

I'm still not quite sure I understand the differences. Prometheus can and is being used to track function execution times, for example in these functions (via _timer and its Drop) where opentelemetry was added too:

+    #[instrument(skip_all, name="Mempool::utxo")]
     pub fn utxo(&self, scripthash: &[u8]) -> Vec<Utxo> {
         let _timer = self.latency.with_label_values(&["utxo"]).start_timer();
         let entries = match self.history.get(scripthash) {
@@ -209,6 +216,7 @@ impl Mempool {
             .collect()
     }

+    #[instrument(skip_all, name="Mempool::stats")]
     // @XXX avoid code duplication with ChainQuery::stats()?
     pub fn stats(&self, scripthash: &[u8]) -> ScriptStats {
         let _timer = self.latency.with_label_values(&["stats"]).start_timer();
@@ -258,12 +266,14 @@ impl Mempool {
         stats
     }

+    #[instrument(skip_all, name="Mempool::txids")]
     // Get all txids in the mempool
     pub fn txids(&self) -> Vec<&Txid> {
         let _timer = self.latency.with_label_values(&["txids"]).start_timer();
         self.txstore.keys().collect()
     }

shesek commented 2 months ago

It's not critical, but it could aid review if the addition of #[instrument()] attributes was separated to its own commit

shesek commented 2 months ago

How do you feel about defining a no-op #[instrument] macro when the tracing feature is disabled, so that we don't have to wrap it in #[cfg_attr(feature = "tracing",... )) every time?

rem1-dev commented 1 month ago

How do you feel about defining a no-op #[instrument] macro when the tracing feature is disabled, so that we don't have to wrap it in #[cfg_attr(feature = "tracing",... )) every time?

I found out that instead of defining my own procedural macro I can just use tracing/max_level_off to turn off tracing by default and make #[instrument] macro no-op by default. I implemented it.

rem1-dev commented 1 month ago

Tracing is turned off by default and can be turned on by providing following params for both cargo build and cargo run commands:

--no-default-features --features otlp-tracing

shesek commented 1 month ago

It would be nice if it was possible to avoid the additional dependencies altogether, not just disable tracing

rem1-dev commented 1 month ago

It would be nice if it was possible to avoid the additional dependencies altogether, not just disable tracing

I guess that needs, as you suggested, a new macro that is no-op in case of otlp-tracing feature flag is off. You're right that we still have dependency on new crates. But by default the only new crates we will depend on is just one: tracing - so that we can turn off it's macro that we used. Only if we turn on future flag we pull other dependencies in addition to tracing: "tracing-subscriber", "opentelemetry", "tracing-opentelemetry", "opentelemetry-otlp", "opentelemetry-semantic-conventions"

so we add 1 new dependency only (or 6 if one decides to use tracing). Is it still not acceptable @shesek ?

shesek commented 1 month ago

I had a preference to avoid additional dependencies altogether, and thought this should be as simple as

#[not(cfg(feature = "otlp"))]
#[proc_macro_attribute]
pub fn instrument(_attr: TokenStream, item: TokenStream) -> TokenStream { item }

However I realized that this would require making this a proc-macro crate would would prevent us from exporting anything other than proc macros, or make a separate crate just for this, which overly complicates things... So yeah, let's stick with the one extra dependency.

shesek commented 1 month ago

It's kind of unfortunate that we have to use a negative feature ...

One thing we could do to make this less of a footgun is to add some explicit useful error messages when --no-default-features is used without re-enabling either otlp features, or when otlp-tracing is enabled without using --no-default-features:

#[cfg(not(any(feature = "otlp-tracing", feature = "no-otlp-tracing")))]
compile_error!("Must enable one of the 'otlp-tracing' or 'no-otlp-tracing' features");
#[cfg(all(feature = "otlp-tracing", feature = "no-otlp-tracing"))]
compile_error!("Cannot enable both the 'otlp-tracing' and 'no-otlp-tracing' features");

RCasatta commented 1 month ago

It's kind of unfortunate that we have to use a negative feature, which requires using --no-default-features to disable no-otlp-tracing when enabling otlp-tracing. It'll become more annoying if we ever add other default features, as they'll have to be re-enabled manually. But I don't see a way around it... is there? (/cc @RCasatta)

By looking here https://github.com/tokio-rs/tracing/blob/bdbaf8007364ed2a766cca851d63de31b7c47e72/tracing/src/level_filters.rs#L68 it seems that maximum tracing is the default so we don't need to opt-in for it? Thus, it seems to me we can avoid the positive feaure? It would require to use code like not(feature = "no_otlp_tracing") which doesn't look great but at least we would need only 1 feature and continue to guarantee that features are additive.

shesek commented 1 month ago

Thus, it seems to me we can avoid the positive feature?

The original goal for using the feature was to make (most) of the additional dependencies optional, which (I think?) we can't do with just a negative feature

RCasatta commented 1 month ago

I see... I understand the issue of having too many deps, but I think it's greater in "money-handling" libs such as wallets than in electrs, so I would bite the bullet and build some deps even though they are not used instead of adding complexity.

rem1-dev commented 1 month ago

Thus, it seems to me we can avoid the positive feature?

The original goal for using the feature was to make (most) of the additional dependencies optional, which (I think?) we can't do with just a negative feature

Yes, that was the motivation behind this, keep additional dependencies at minimum and also make #instrument[] annotations no-op by default. I couldn't find other solution than --no-default-features --features otlp-tracing and turning off tracing in default features.

shesek commented 4 weeks ago

the issue of having too many deps, but I think it's greater in "money-handling" libs such as wallets than in electrs

I generally agree with the sentiment, but its a lot of new dependencies - 6 direct ones, expanded into a total of 71 new crates in the dependency graph - for a feature that won't typically be used by other users of esplora/electrs.

Adding just the tracing crate as implemented by @rem1-dev reduces it to just 2 new dependencies (tracing+ tracing-core).

shesek commented 4 weeks ago

I miscalculated the crates in my previous comment - the full set of dependencies expands into 41 new crates in the graph, while no-otlp-tracingdoesn't add any new ones at all (tracing is already a sub-dependency via hyper).

rem1-dev commented 3 weeks ago

@shesek github summary for this PR shows there is still a "requested changes" but I can't find anything. Is there anything else I should change?

shesek commented 3 weeks ago

github summary for this PR shows there is still a "requested changes" but I can't find anything

I don't see anything either, seems that all the requests were resolved :+1:

Could you run cargo fmt and perhaps squash some of the commits? It's ready to go other than that.

rem1-dev commented 3 weeks ago

github summary for this PR shows there is still a "requested changes" but I can't find anything

I don't see anything either, seems that all the requests were resolved 👍

Could you run cargo fmt and perhaps squash some of the commits? It's ready to go other than that.

I run cargo fmt. Unfortunately squashing would be a bigger effort because when I try to rebase and squash commits there are lots of merge conflicts. I don't understand that since I already resolved conflicts when merging new-index into my branch. Anyway can we skip squashing?

shesek commented 2 weeks ago

@rem1-dev Yes it's fine, I can just squash to a single commit when merging.

However it seems that there are merge conflicts, can you look into that?

rem1-dev commented 2 weeks ago

@rem1-dev Yes it's fine, I can just squash to a single commit when merging.

However it seems that there are merge conflicts, can you look into that?

Are there? Github says "This branch has no conflicts with the base branch". When I tried merge new-index to my branch locally also git said all is up to date. I did small change: amended last commit message.

RCasatta commented 2 weeks ago

I confirm I cannot find conflicts, @shesek I'll let you merge if you want to squash

rem1-dev commented 2 weeks ago

Let's not merge it yet. I'll try again and solve merge conflict during commits squash.

rem1-dev commented 2 weeks ago

I squashed commits. PR must be approved again.

rem1-dev commented 2 weeks ago

Something wen't awfully wrong during that commits squash and new-index branch merge. Not mine commits are now in commits list. I think best to do now is if I create fresh branch from new-index and apply my changes there. Let's close this PR and I'll create new one. Sorry for that.

Blockstream / electrs

Enabling tracing by usage of opentelemetry #99