Closed ugur99 closed 7 months ago
cc @dashpole
Hey @ugur99 thanks for raising this. I'm wondering if what you are seeing is related to the following which was introduced between 3.5.4 and 3.5.5:
It looks like for main
/ 3.6
we also have a configurable sample rate option: https://github.com/etcd-io/etcd/pull/13248/files#diff-538c79cd00ec18cb43b5dddd5f36b979d9d050cf478a241304493284739d31bfR353
However that sample rate option doesn't appear to be documented yet? Refer: https://etcd.io/docs/v3.6/op-guide/configuration/#experimental-distributed-tracing
We might need to do a backport of the ExperimentalDistributedTracingSamplingRatePerMillion
configuration option to resolve this for 3.5 stable release branch.
@serathius can you please add an area/observability
label. We don't really have any way of grouping tracing/loging/metrics related issues currently.
thank you both @serathius @jmhbnz for the quick replies! It might be a good idea as well to document current status in v3.5 documentation as well.
Have started working on a fix https://github.com/etcd-io/etcd/pull/16951. Need to setup a tracing test environment to validate before marking ready for review.
Edit: Pull request is now ready for review.
Hey @ugur99 - A quick update, the code fix has been merged in #16951 and our documentation updated in https://github.com/etcd-io/website/pull/753. The fix will be available for users very shortly in our upcoming v3.5.11
stable release.
I'll close this issue now as the work is completed, many thanks for reporting this bug 🙏🏻
Bug report criteria
What happened?
Hi team, we're trying to enable distributed tracing for etcd, and we've configured it with the required flags according to the documentation. However, it does not seem to work properly with the v3.5.5+ (same config works fine with v3.5.4 without any problems); not sure if its a bug/regression or misdocumentation problem. Do you have any ideas or suggestions on this? Thanks in advance!
What did you expect to happen?
Distributed tracing should work for later versions too; it seems that it does not work with the required config flags that put in documentation.
How can we reproduce it (as minimally and precisely as possible)?
You can check both etcd versions with the same configuration. To providing such a playground you can provision a cluster by kind (or kubespray) and install required apps like tempo, opentelemetry operator. After than provision a collector that uses localNetwork on controlplane nodes and observe the different behaviours between v3.5.4 and v3.5.5+
collector config -->
Anything else we need to know?
No response
Etcd version (please run commands below)
Etcd configuration (command line flags or environment variables)
Etcd debug information (please run commands below, feel free to obfuscate the IP address or FQDN in the output)
No response
Relevant log output
No response