aws / aws-xray-sdk-go

AWS X-Ray SDK for the Go programming language.
Apache License 2.0
278 stars 117 forks source link

xray.BeginSegment does not honor sampling rule #242

Closed kevioke closed 4 years ago

kevioke commented 4 years ago

Hi everyone. I'm currently trying to debug an issue where it seems like xray is sampling more requests than it should. I'm currently suspecting that my use of xray.BeginSegment is not properly honoring the custom centralized sampling rule that I've set up.

Setup

package versions: go version go1.13.9 darwin/amd64 github.com/aws/aws-xray-sdk-go v1.0.1

Custom centralized sampling rule with priority 1 named test-sampling-rule (I set this up in the AWS console):

Service name equals some-other-service-name
Service type matches *
Host matches *
Resource ARN matches *
HTTP method matches *
URL path matches *

Default sampling rule

Service name matches *
Service type matches *
Host matches *
Resource ARN matches *
HTTP method matches *
URL path matches *

My reproduction code simply produces a Segment every 5 seconds:

package main

import (
    "context"
    "fmt"
    "os"
    "time"

    "github.com/aws/aws-xray-sdk-go/xray"
    "github.com/aws/aws-xray-sdk-go/xraylog"
)

func main() {
    xray.SetLogger(xraylog.NewDefaultLogger(os.Stderr, xraylog.LogLevelDebug))

    for i := 0; i < 100; i++ {
        ctx := context.Background()
        ctx, seg := xray.BeginSegment(ctx, "my-test-service")
        seg.Close(nil)
        time.Sleep(5 * time.Second)
        fmt.Printf("Sent %d segment(s)\n", i)
    }

    fmt.Println("done")
}

Partial Output

2020-07-20T08:50:21-07:00 [DEBUG] Beginning segment named my-test-service
2020-07-20T08:50:21-07:00 [DEBUG] Determining ShouldTrace decision for:
    host:
    path:
    method:
    servicename:
    servicetype:
2020-07-20T08:50:21-07:00 [DEBUG] Applicable rule: test-sampling-rule
2020-07-20T08:50:21-07:00 [DEBUG] Sampling target has expired for rule test-sampling-rule. Borrowing a request.
2020-07-20T08:50:21-07:00 [DEBUG] SamplingStrategy decided: true
2020-07-20T08:50:21-07:00 [DEBUG] Closing segment named my-test-service
2020-07-20T08:50:21-07:00 [DEBUG] {"trace_id":"1-5f15bd3d-9f0d64dda8cc4f0fbb8e6e6c","id":"4706a61e1629218c","name":"my-test-service","start_time":1595260221.568392,"end_time":1595260221.568487,"aws":{"xray":{"sdk_version":"1.0.0","sdk":"X-Ray for Go","sampling_rule_name":"test-sampling-rule"}},"service":{"compiler_version":"go1.13.9","compiler":"gc"},"Dummy":false}
Sent 1 segment(s)
2020-07-20T08:50:26-07:00 [DEBUG] Beginning segment named my-test-service
2020-07-20T08:50:26-07:00 [DEBUG] Determining ShouldTrace decision for:
    host:
    path:
    method:
    servicename:
    servicetype:
2020-07-20T08:50:26-07:00 [DEBUG] Applicable rule: test-sampling-rule
2020-07-20T08:50:26-07:00 [DEBUG] Sampling target has expired for rule test-sampling-rule. Borrowing a request.
2020-07-20T08:50:26-07:00 [DEBUG] SamplingStrategy decided: true
2020-07-20T08:50:26-07:00 [DEBUG] Closing segment named my-test-service
2020-07-20T08:50:26-07:00 [DEBUG] {"trace_id":"1-5f15bd42-5ceb5af77316f0228eb7875c","id":"bbc44069549f1f56","name":"my-test-service","start_time":1595260226.569379,"end_time":1595260226.5694761,"aws":{"xray":{"sdk_version":"1.0.0","sdk":"X-Ray for Go","sampling_rule_name":"test-sampling-rule"}},"service":{"compiler_version":"go1.13.9","compiler":"gc"},"Dummy":false}
2020-07-20T08:50:26-07:00 [INFO] Successfully refreshed sampling targets
Sent 2 segment(s)

Expected behavior

The Segment is named my-test-service but the custom sampling rule I've setup in the AWS console specifically indicates that this rule should only apply to requests that have the service name equal to some-other-service-name. The expected behavior should be that all these requests should fall through to the default rule and NOT be applicable with the custom rule I set up. While browsing the SDK source code, I stumbled upon the matching logic here https://github.com/aws/aws-xray-sdk-go/blob/aa6a33f684fd0b1760ac6e9b0bf0590f0dfb9bad/strategy/sampling/sampling_rule.go#L41. I suspect segments without any host, path, method, servicename, servicetype fields always match with any rule which is counterintuitive for me.

A use case for me is that I want to be able to setup rules that sample a particular service to sample 100%. However, with this behavior I cannot do that because it would mean other services which use BeginSegment would also sample at 100% leading to an explosion of costs.

bhautikpip commented 4 years ago

Hi @kevioke ,

Thanks for opening this issue. Current behavior of xray Go SDK samples requests with default sampling rate if you use xray.BeginSegment API directly and not do any instrumentation. I believe that's why you're experiencing this issue. However I am working on a Pull request which doesn't do default sampling on xray.BeginSegment API and probably solve your issue too. Thanks for the patience.

Regards, Bhautik

kevioke commented 4 years ago

@bhautikpip I would be OK with xray.BeginSegment using the Default sampling rule which I've set to sample with a low sampling rate. However, the issue for me is that xray.BeginSegment seems to be using any sampling rule which has a higher priority even if that rule has restrictive rules as shown in my example.

bhautikpip commented 4 years ago

I see. I think this method (https://github.com/aws/aws-xray-sdk-go/blob/aa6a33f684fd0b1760ac6e9b0bf0590f0dfb9bad/strategy/sampling/sampling_rule.go#L33) basically returns true in your case and that's why it doesn't apply default sampling rule and applied rules that you have set in AWS X-Ray console. If we take a look at this method closely it basically checks for host, request and method and if they are matching it returns true and apply rules you have set. In your case you have set rules based on service name which I believe this method doesn't check and therefore it leads SDK to sample based on custom rule and not default one. This is an interesting use case. I will reach out to you with more updates on this issue.

kevioke commented 4 years ago

@bhautikpip thank you for taking the time to review this issue. Just to provide some more clarifications, I don't think the method you linked is the one that's being called. There are two such methods in the SDK, one for centralized rules and one for local rules. Correct me if i'm wrong, but I think it's actually this one which is for the centralized rule https://github.com/aws/aws-xray-sdk-go/blob/aa6a33f684fd0b1760ac6e9b0bf0590f0dfb9bad/strategy/sampling/sampling_rule.go#L41

I suspect so because the logs I provided printed out:

2020-07-20T08:50:26-07:00 [DEBUG] Determining ShouldTrace decision for:
    host:
    path:
    method:
    servicename:
    servicetype:

This only gets printed in this code path https://github.com/aws/aws-xray-sdk-go/blob/aa6a33f684fd0b1760ac6e9b0bf0590f0dfb9bad/strategy/sampling/centralized.go#L141. The other code path here https://github.com/aws/aws-xray-sdk-go/blob/aa6a33f684fd0b1760ac6e9b0bf0590f0dfb9bad/strategy/sampling/localized.go#L63 only applies to localized rules which is not the case with my issue.

In which case, the centralized rule for the AppliesTo method has an issue because it short circuits any pattern matches if the fields are empty. Here is the code

    return (request.Host == "" || pattern.WildcardMatchCaseInsensitive(r.Host, request.Host)) &&
        (request.URL == "" || pattern.WildcardMatchCaseInsensitive(r.URLPath, request.URL)) &&
        (request.Method == "" || pattern.WildcardMatchCaseInsensitive(r.HTTPMethod, request.Method)) &&
        (request.ServiceName == "" || pattern.WildcardMatchCaseInsensitive(r.ServiceName, request.ServiceName)) &&
        (request.ServiceType == "" || pattern.WildcardMatchCaseInsensitive(r.serviceType, request.ServiceType))

You can see that if request.ServiceName is empty and if all the other request.* fields are empty, this statement will result in a true. I suspect a correct implementation can be:

    return pattern.WildcardMatchCaseInsensitive(r.Host, request.Host) &&
        pattern.WildcardMatchCaseInsensitive(r.URLPath, request.URL) &&
        pattern.WildcardMatchCaseInsensitive(r.HTTPMethod, request.Method) &&
        pattern.WildcardMatchCaseInsensitive(r.ServiceName, request.ServiceName) &&
        pattern.WildcardMatchCaseInsensitive(r.serviceType, request.ServiceType)
bhautikpip commented 4 years ago

Hi @kevioke ,

I see. looks like when you send a request using xray.BeginSegment API since Host, URL, Method, ServiceName and SeviceType fields might be empty and because of that AppliesTo method will return True and apply rule with higher priority and not default sampling. We would love to review your PR to resolve this.

kevioke commented 4 years ago

@bhautikpip here's my proposal for a fix https://github.com/aws/aws-xray-sdk-go/pull/244. Appreciate the help with reviewing it.