Closed whyrusleeping closed 3 years ago
It looks like the state machine is getting locked up on the client side.
In estuary is there another thread that might be blocking the state machine from progressing? For example if you subscribe to events, the event handler must return before the rest of the state machine can progress.
What version of go-data-transfer are you using?
It seems that there is a deadlock in the graphsync transport:
goroutine 36943332 [semacquire, 1394 minutes]:
sync.runtime_SemacquireMutex(0xc00051c4b4, 0x59ee00, 0x1)
/usr/local/go/src/runtime/sema.go:71 +0x47
sync.(*Mutex).lockSlow(0xc00051c4b0)
/usr/local/go/src/sync/mutex.go:138 +0x105
sync.(*Mutex).Lock(...)
/usr/local/go/src/sync/mutex.go:81
sync.(*RWMutex).Lock(0xc00051c4b0)
/usr/local/go/src/sync/rwmutex.go:111 +0x90
github.com/filecoin-project/go-data-transfer/transport/graphsync.(*Transport).CleanupChannel(0xc00051c480, 0xc00bc57f20, 0x26, 0xc00bc57f80, 0x26, 0x167f4983a2da1a85)
/home/why/go/pkg/mod/github.com/filecoin-project/go-data-transfer@v1.4.3/transport/graphsync/graphsync.go:309 +0x3d
github.com/filecoin-project/go-data-transfer/impl.(*channelEnvironment).CleanupChannel(0xc000010678, 0xc00bc57f20, 0x26, 0xc00bc57f80, 0x26, 0x167f4983a2da1a85)
/home/why/go/pkg/mod/github.com/filecoin-project/go-data-transfer@v1.4.3/impl/environment.go:26 +0x55
github.com/filecoin-project/go-data-transfer/channels.cleanupConnection(0x7f1ea01c9e90, 0xc0d210e120, 0x30c1678, 0xc000010678, 0xc00bc57ec0, 0x26, 0x167f4983a2da1a85, 0xc00bc57f20, 0x26, 0xc00bc57f80, ...)
/home/why/go/pkg/mod/github.com/filecoin-project/go-data-transfer@v1.4.3/channels/channels_fsm.go:213 +0x132
reflect.Value.call(0x294d060, 0x2ed4ca8, 0x13, 0x2c3d2cc, 0x4, 0xc00a729728, 0x3, 0x3, 0x53a1df, 0x2ad54e0, ...)
/usr/local/go/src/reflect/value.go:476 +0x8e7
reflect.Value.Call(0x294d060, 0x2ed4ca8, 0x13, 0xc0048c7728, 0x3, 0x3, 0x0, 0xe52319c0f89aedc, 0x307c7b80ccd26a8)
/usr/local/go/src/reflect/value.go:337 +0xb9
github.com/filecoin-project/go-statemachine/fsm.fsmHandler.handler.func1(0xc0d293b1d0, 0x2, 0x2, 0xc0d293b1d0, 0x6bd2cadc80c04709, 0xc00d0dc5bd)
/home/why/go/pkg/mod/github.com/filecoin-project/go-statemachine@v0.0.0-20200925024713-05bd7c71fbfe/fsm/fsm.go:170 +0x437
reflect.Value.call(0xc003424d20, 0xc0d293b140, 0x13, 0x2c3d2cc, 0x4, 0xc0048c7f70, 0x2, 0x2, 0x53a1df, 0x2a13b20, ...)
/usr/local/go/src/reflect/value.go:476 +0x8e7
reflect.Value.Call(0xc003424d20, 0xc0d293b140, 0x13, 0xc006a15f70, 0x2, 0x2, 0x5a3bf2baf0a0, 0x431b3e0, 0xc000186580)
/usr/local/go/src/reflect/value.go:337 +0xb9
github.com/filecoin-project/go-statemachine.(*StateMachine).run.func3(0xc004093370, 0x30c1410, 0xc000120008, 0xc004093420, 0xc004093380, 0xc0023749c0)
/home/why/go/pkg/mod/github.com/filecoin-project/go-statemachine@v0.0.0-20200925024713-05bd7c71fbfe/machine.go:102 +0x2c9
created by github.com/filecoin-project/go-statemachine.(*StateMachine).run
/home/why/go/pkg/mod/github.com/filecoin-project/go-statemachine@v0.0.0-20200925024713-05bd7c71fbfe/machine.go:100 +0x372
According to this log the version of go-data-transfer being run is 1.4.3.
I believe the deadlock was fixed by https://github.com/filecoin-project/go-data-transfer/pull/199 on this line: https://github.com/filecoin-project/go-data-transfer/pull/199/files#diff-ef4bcba1f2d11506278385f0dd22312705002145b9c930293ede611798ad4ed9L378
So running v1.5.0 (or master) should fix the issue.
Example stack trace:
Full trace: http://ipfs.io/ipfs/QmVk3KAf4QCtRDrq5SRRQn5sJVZK5vqvzkY3muBUSiWVDP