bazel-contrib / target-determinator

Determines which Bazel targets were affected between two git commits.
Apache License 2.0
144 stars 22 forks source link

deadlock in hashTarget -> build.Target_RULE -> hashRule #57

Closed com6056 closed 1 year ago

com6056 commented 1 year ago

Haven't narrowed it down exactly yet but seems there is a deadlock somewhere in the following flow:

PrefillCache calls queryInfo.TargetHashCache.Hash which holds onto entry.hashLock.Lock() while it calls hashTarget.

hashTarget, in the case of build.Target_RULE, calls hashRule, which eventually calls thc.Hash again which blocks on entry.hashLock.Lock() indefinitely.

Trying to narrow down exactly what is going on but figured I'd surface this in case someone else can figure it out faster 😛

com6056 commented 1 year ago

In our case, I think I mostly see this happening with @io_bazel_rules_go//:go_context_data, but need to validate.

com6056 commented 1 year ago

Looks like it might be related to nogo usage? I added some messy logging to see what was holding onto the lock and got this:

{@io_bazel_rules_go//:go_context_data {0a4d9d48f377253da730a871cf79a6613f4c9221c23ad1ec757a57e84ac16090}} still has the lock
{//:foo_nogo {0a4d9d48f377253da730a871cf79a6613f4c9221c23ad1ec757a57e84ac16090}} still has the lock
{@io_bazel_rules_nogo//:nogo {0a4d9d48f377253da730a871cf79a6613f4c9221c23ad1ec757a57e84ac16090}} still has the lock
{//tools/bazel/analyzers/unusedresult {ef509650ff1c02d363060ca6742b0e13ff929a68634d1c51973a4d2bca6cb636}} still has the lock
{@org_golang_x_tools//go/analysis {ef509650ff1c02d363060ca6742b0e13ff929a68634d1c51973a4d2bca6cb636}} still has the lock
{//:mux_nogo_actual {ef509650ff1c02d363060ca6742b0e13ff929a68634d1c51973a4d2bca6cb636}} still has the lock

diff of the logging changes as well in case I didn't do it right 😅

diff --git a/pkg/hash_cache.go b/pkg/hash_cache.go
index ec7b3ca..acd85ef 100644
--- a/pkg/hash_cache.go
+++ b/pkg/hash_cache.go
@@ -2,6 +2,7 @@ package pkg

 import (
        "bytes"
+       "context"
        "crypto/sha256"
        "encoding/binary"
        "errors"
@@ -12,6 +13,7 @@ import (
        "sort"
        "strings"
        "sync"
+       "time"

        ss "github.com/bazel-contrib/target-determinator/common/sorted_set"
        "github.com/bazel-contrib/target-determinator/third_party/protobuf/bazel/analysis"
@@ -99,8 +101,23 @@ func (thc *TargetHashCache) Hash(labelAndConfiguration LabelAndConfiguration) ([
                entry = newEntry
        }
        thc.cacheLock.Unlock()
+       fmt.Println(labelAndConfiguration, "entry.hashLock.Lock()")
        entry.hashLock.Lock()
+       ctx, cancel := context.WithCancel(context.Background())
+       go func(context.Context) {
+               for {
+                       select {
+                       case <-ctx.Done():
+                               return
+                       default:
+                               fmt.Println(labelAndConfiguration, "still has the lock")
+                               time.Sleep(3 * time.Second)
+                       }
+               }
+       }(ctx)
+       fmt.Println(labelAndConfiguration, "entry.hashLock.Unlock()")
        defer entry.hashLock.Unlock()
+       defer cancel()
        if entry.hash == nil {
                if thc.frozen {
                        return nil, notComputedBeforeFrozen
illicitonion commented 1 year ago

We strongly assume that the dependency graph returned in cquery is acyclic, because Bazel is meant to guarantee that; a deadlock here would suggest a cycle in that graph.

We probably should make this code more resilient to cycles, but regardless if you can put together a small repro repo so we can investigate why Bazel is returning a cycle here (and whether it's a bug in Bazel), that'd be really handy!

com6056 commented 1 year ago

Yep! Looks like it is reproducible with stock usage of nogo in rules_go: https://github.com/com6056/td-cycle-repro

com6056 commented 1 year ago

Let me know if you are able to reproduce with https://github.com/com6056/td-cycle-repro or not or if I can provide any additional info to help out, thanks, really appreciate it! 🙌

illicitonion commented 1 year ago

I do reproduce - thanks so much for the reduced test case, I'll get a fix up soon!