Documentation: runtime tracing.random_sampling inaccurate with Datadog provider

nulltrope commented 1 year ago

Title: Documentation: runtime tracing.random_sampling inaccurate

Description: The Runtime documentation for tracing.random_sampling states that the value should be between 0-10000, however in my testing it seems that in reality, any value >= 100 will result in all requests emitting traces. Note: I have only been able to test with the Datadog trace provider, as that is what I have setup to use, so I can't confirm whether this is isolated to the Datadog provider or not.

Repro steps:

Set runtime tracing.random_sampling to any value over 100, e.g. 1000
Send a couple requests and note that envoy emits a trace for all of them, when according to the documentation it should only emit 10% (10000/1000)

Config: (Only including the relevant config snippets)

    "tracing": {
     "http": {
      "name": "envoy.tracers.datadog",
      "typed_config": {
       "@type": "type.googleapis.com/envoy.config.trace.v3.DatadogConfig",
       "collector_cluster": "datadog_agent",
       "service_name": "envoy-sidecar"
      }
     }
    },
...
    "layered_runtime": {
     "layers": [
      {
       "name": "static_layer_0",
       "static_layer": {
        "tracing": {
         "random_sampling": 1000
        }
       }
      }
     ]
    }
...
"clusters": [
      {
       "name": "datadog_agent",
       "type": "STATIC",
       "connect_timeout": "0.250s",
       "load_assignment": {
        "cluster_name": "datadog_agent",
        "endpoints": [
         {
          "lb_endpoints": [
           {
            "endpoint": {
             "address": {
              "socket_address": {
               "address": "<AGENT IP>",
               "port_value": 8126
              }
             }
            }
           }
          ]
         }
        ]
       }
      }
     ]

And an example HTTP connection manager config:

{
    "name": "envoy.http_connection_manager",
    "typed_config": {
     "@type": "type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager",
     "stat_prefix": "REDACTED",
     "rds": {
      "config_source": {
       "ads": {},
       "resource_api_version": "V3"
      },
      "route_config_name": "REDACTED"
     },
     "http_filters": [
      {
       "name": "envoy.filters.http.fault",
       "typed_config": {
        "@type": "type.googleapis.com/envoy.extensions.filters.http.fault.v3.HTTPFault"
       }
      },
      {
       "name": "envoy.filters.http.cors",
       "typed_config": {
        "@type": "type.googleapis.com/envoy.extensions.filters.http.cors.v3.Cors"
       }
      },
      {
       "name": "envoy.filters.http.router",
       "typed_config": {
        "@type": "type.googleapis.com/envoy.extensions.filters.http.router.v3.Router",
        "suppress_envoy_headers": true
       }
      }
     ],
     "tracing": {},
     "use_remote_address": true,
     "generate_request_id": true,
     "normalize_path": true,
     "merge_slashes": true
    }
   }

wbpcode commented 1 year ago

Seems that the imlementation is different with the docs' description. You can try to specify a fractional percent directly with a proto struct. Or the integer value will be treat as percent value with a denominator 100.

bool SnapshotImpl::featureEnabled(absl::string_view key,
                                  const envoy::type::v3::FractionalPercent& default_value,
                                  uint64_t random_value) const {
  const auto& entry = key.empty() ? values_.end() : values_.find(key);
  envoy::type::v3::FractionalPercent percent;
  if (entry != values_.end() && entry->second.fractional_percent_value_.has_value()) {
    percent = entry->second.fractional_percent_value_.value();
  } else if (entry != values_.end() && entry->second.uint_value_.has_value()) {
    // Check for > 100 because the runtime value is assumed to be specified as
    // an integer, and it also ensures that truncating the uint64_t runtime
    // value into a uint32_t percent numerator later is safe
    if (entry->second.uint_value_.value() > 100) {
      return true;
    }

    // The runtime value was specified as an integer rather than a fractional
    // percent proto. To preserve legacy semantics, we treat it as a percentage
    // (i.e. denominator of 100).
    percent.set_numerator(entry->second.uint_value_.value());
    percent.set_denominator(envoy::type::v3::FractionalPercent::HUNDRED);
  } else {
    percent = default_value;
  }

  // When numerator > denominator condition is always evaluates to TRUE
  // It becomes hard to debug why configuration does not work in case of wrong numerator.
  // Log debug message that numerator is invalid.
  uint64_t denominator_value =
      ProtobufPercentHelper::fractionalPercentDenominatorToInt(percent.denominator());
  if (percent.numerator() > denominator_value) {
    ENVOY_LOG(debug,
              "WARNING runtime key '{}': numerator ({}) > denominator ({}), condition always "
              "evaluates to true",
              key, percent.numerator(), denominator_value);
  }

  return ProtobufPercentHelper::evaluateFractionalPercent(percent, random_value);
}

Should we fix the implementation or just update the doc? @phlax (Seems update doc)

phlax commented 1 year ago

Should we fix the implementation or just update the doc? @phlax (Seems update doc)

apologies i missed this previously - i would say update the docs - as the current implementation is what anyone using it will expect - even if docs are wrong

happy to review prs - i only half follow the actual issue

wbpcode commented 1 year ago

I forgot this pr completely. 🤣 Okay I will creat a simple patch.

StarryVae commented 1 year ago

it seems a simple doc update, i can try this. cc @wbpcode 🤣

envoyproxy / envoy

Documentation: runtime tracing.random_sampling inaccurate with Datadog provider #24180