Closed com6056 closed 1 year ago
In our case, I think I mostly see this happening with @io_bazel_rules_go//:go_context_data
, but need to validate.
Looks like it might be related to nogo
usage? I added some messy logging to see what was holding onto the lock and got this:
{@io_bazel_rules_go//:go_context_data {0a4d9d48f377253da730a871cf79a6613f4c9221c23ad1ec757a57e84ac16090}} still has the lock
{//:foo_nogo {0a4d9d48f377253da730a871cf79a6613f4c9221c23ad1ec757a57e84ac16090}} still has the lock
{@io_bazel_rules_nogo//:nogo {0a4d9d48f377253da730a871cf79a6613f4c9221c23ad1ec757a57e84ac16090}} still has the lock
{//tools/bazel/analyzers/unusedresult {ef509650ff1c02d363060ca6742b0e13ff929a68634d1c51973a4d2bca6cb636}} still has the lock
{@org_golang_x_tools//go/analysis {ef509650ff1c02d363060ca6742b0e13ff929a68634d1c51973a4d2bca6cb636}} still has the lock
{//:mux_nogo_actual {ef509650ff1c02d363060ca6742b0e13ff929a68634d1c51973a4d2bca6cb636}} still has the lock
diff of the logging changes as well in case I didn't do it right 😅
diff --git a/pkg/hash_cache.go b/pkg/hash_cache.go
index ec7b3ca..acd85ef 100644
--- a/pkg/hash_cache.go
+++ b/pkg/hash_cache.go
@@ -2,6 +2,7 @@ package pkg
import (
"bytes"
+ "context"
"crypto/sha256"
"encoding/binary"
"errors"
@@ -12,6 +13,7 @@ import (
"sort"
"strings"
"sync"
+ "time"
ss "github.com/bazel-contrib/target-determinator/common/sorted_set"
"github.com/bazel-contrib/target-determinator/third_party/protobuf/bazel/analysis"
@@ -99,8 +101,23 @@ func (thc *TargetHashCache) Hash(labelAndConfiguration LabelAndConfiguration) ([
entry = newEntry
}
thc.cacheLock.Unlock()
+ fmt.Println(labelAndConfiguration, "entry.hashLock.Lock()")
entry.hashLock.Lock()
+ ctx, cancel := context.WithCancel(context.Background())
+ go func(context.Context) {
+ for {
+ select {
+ case <-ctx.Done():
+ return
+ default:
+ fmt.Println(labelAndConfiguration, "still has the lock")
+ time.Sleep(3 * time.Second)
+ }
+ }
+ }(ctx)
+ fmt.Println(labelAndConfiguration, "entry.hashLock.Unlock()")
defer entry.hashLock.Unlock()
+ defer cancel()
if entry.hash == nil {
if thc.frozen {
return nil, notComputedBeforeFrozen
We strongly assume that the dependency graph returned in cquery is acyclic, because Bazel is meant to guarantee that; a deadlock here would suggest a cycle in that graph.
We probably should make this code more resilient to cycles, but regardless if you can put together a small repro repo so we can investigate why Bazel is returning a cycle here (and whether it's a bug in Bazel), that'd be really handy!
Yep! Looks like it is reproducible with stock usage of nogo
in rules_go
: https://github.com/com6056/td-cycle-repro
Let me know if you are able to reproduce with https://github.com/com6056/td-cycle-repro or not or if I can provide any additional info to help out, thanks, really appreciate it! 🙌
I do reproduce - thanks so much for the reduced test case, I'll get a fix up soon!
Haven't narrowed it down exactly yet but seems there is a deadlock somewhere in the following flow:
PrefillCache
callsqueryInfo.TargetHashCache.Hash
which holds ontoentry.hashLock.Lock()
while it callshashTarget
.hashTarget
, in the case ofbuild.Target_RULE
, callshashRule
, which eventually callsthc.Hash
again which blocks onentry.hashLock.Lock()
indefinitely.Trying to narrow down exactly what is going on but figured I'd surface this in case someone else can figure it out faster 😛