aws / aws-xray-sdk-go

AWS X-Ray SDK for the Go programming language.
Apache License 2.0
276 stars 117 forks source link

Mutex deadlock #392

Open nikolaianohyn opened 1 year ago

nikolaianohyn commented 1 year ago

In the function bellow next situation: we lock mutex seg.Lock() and realize it only after function is done. By the way seg.parent.RemoveSubsegment(seg) try to lock parent segment that already locked.

// CloseAndStream closes a subsegment and sends it.
func (seg *Segment) CloseAndStream(err error) {
    // If SDK is disabled then return
    if SdkDisabled() {
        return
    }

    seg.Lock()
    defer seg.Unlock()

    if seg.parent != nil {
        logger.Debugf("Ending subsegment named: %s", seg.Name)
        seg.EndTime = float64(time.Now().UnixNano()) / float64(time.Second)
        seg.InProgress = false
        seg.Emitted = true
        if seg.parent.RemoveSubsegment(seg) {
            logger.Debugf("Removing subsegment named: %s", seg.Name)
        }
    }

    if err != nil {
        seg.addError(err)
    }

    // If segment is dummy we return
    if seg.Dummy {
        return
    }

    seg.beforeEmitSubsegment(seg.parent)
    seg.emit()
}

Parent segment is locked because in parallel we call send() function for parent segment. send() locks current segment(which is parent) and then call packSegments which try to lock the subsegment which already locked upper. SO DEADLOCK.

Guys, I've been debugging it for a week. Pls write more tests for routine leaks and deadlocks and etc...

The working solution for this:

// CloseAndStream closes a subsegment and sends it.
func (seg *Segment) CloseAndStream(err error) {
    // If SDK is disabled then return
    if SdkDisabled() {
        return
    }

    if seg.parent != nil {
        seg.Lock()
        logger.Debugf("Ending subsegment named: %s", seg.Name)
        seg.EndTime = float64(time.Now().UnixNano()) / float64(time.Second)
        seg.InProgress = false
        seg.Emitted = true
        seg.Unlock()
        if seg.parent.RemoveSubsegment(seg) {
            logger.Debugf("Removing subsegment named: %s", seg.Name)
        }
    }

    seg.Lock()
    defer seg.Unlock()

    if err != nil {
        seg.addError(err)
    }

    // If segment is dummy we return
    if seg.Dummy {
        return
    }

    seg.beforeEmitSubsegment(seg.parent)
    seg.emit()
}
willarmiros commented 1 year ago

Thanks for reporting this, and contributing a fix in #393. It will be released in our next release.