filecoin-project / lotus

Reference implementation of the Filecoin protocol, written in Go
https://lotus.filecoin.io/
Other
2.85k stars 1.27k forks source link

[Deal Making Issue] GetTransferStatus hanging #6273

Closed whyrusleeping closed 3 years ago

whyrusleeping commented 3 years ago

Example stack trace:

goroutine 649 [select, 1394 minutes]:
github.com/filecoin-project/go-statemachine/fsm.(*stateGroup).SendSync(0xc00059bf10, 0x30c1480, 0xc00c0629c0, 0x2ad5400, 0xc00c062a50, 0x2970740, 0x434c100, 0x0, 0x0, 0x0, ...)
    /home/why/go/pkg/mod/github.com/filecoin-project/go-statemachine@v0.0.0-20200925024713-05bd7c71fbfe/fsm/fsm_group.go:77 +0x1eb
github.com/filecoin-project/go-statemachine/fsm.(*stateGroup).GetSync(0xc00059bf10, 0x30c1480, 0xc00c0629c0, 0x2ad5400, 0xc00c062a50, 0x308a5a0, 0xc005b54a80, 0xc00c062a50, 0xc00c062a50)
    /home/why/go/pkg/mod/github.com/filecoin-project/go-statemachine@v0.0.0-20200925024713-05bd7c71fbfe/fsm/fsm_group.go:55 +0x9d
github.com/filecoin-project/go-ds-versioning/pkg/fsm.(*migratedFsm).GetSync(0xc0030f63e0, 0x30c1480, 0xc00c0629c0, 0x2ad5400, 0xc00c062a50, 0x308a5a0, 0xc005b54a80, 0xc057741f40, 0x40)
    /home/why/go/pkg/mod/github.com/filecoin-project/go-ds-versioning@v0.1.0/pkg/fsm/fsm.go:84 +0xbc
github.com/filecoin-project/go-data-transfer/channels.(*Channels).GetByID(0xc0030b6870, 0x30c1480, 0xc00c0629c0, 0xc0085d41b0, 0x26, 0xc0085d41e0, 0x26, 0x167f4983a2da1a85, 0x3, 0xffffffffffffffff, ...)
    /home/why/go/pkg/mod/github.com/filecoin-project/go-data-transfer@v1.4.3/channels/channels.go:209 +0x10e
github.com/filecoin-project/go-data-transfer/impl.(*manager).ChannelState(0xc0006ae750, 0x30c1480, 0xc00c0629c0, 0xc0085d41b0, 0x26, 0xc0085d41e0, 0x26, 0x167f4983a2da1a85, 0x26, 0xc0085d41e0, ...)
    /home/why/go/pkg/mod/github.com/filecoin-project/go-data-transfer@v1.4.3/impl/impl.go:403 +0x71
github.com/whyrusleeping/estuary/filclient.(*FilClient).TransferStatus(0xc00025e000, 0x30c1480, 0xc00c0629c0, 0xc0062470e8, 0x115d4060, 0xed83368ec, 0x431b3e0)
    /home/why/code/estuary/filclient/filclient.go:550 +0xa5
main.(*ContentManager).GetTransferStatus(0xc00051c600, 0x30c1480, 0xc00c0629c0, 0xc007eac840, 0xc0d10ed0b0, 0x22, 0x0, 0x0, 0x0)
    /home/why/code/estuary/replication.go:667 +0x82d
main.(*ContentManager).checkDeal(0xc00051c600, 0x30c1480, 0xc00f374390, 0xc007eac840, 0xc006247c00, 0x0, 0x0)
    /home/why/code/estuary/replication.go:610 +0x79f
main.(*ContentManager).ensureStorage(0xc00051c600, 0x30c1410, 0xc0c1e469f0, 0x42, 0x2df048d0, 0xed830a741, 0x431b3e0, 0x2df048d0, 0xed830a741, 0x431b3e0, ...)
    /home/why/code/estuary/replication.go:481 +0x5ab
main.(*ContentManager).ContentWatcher(0xc00051c600)
    /home/why/code/estuary/replication.go:91 +0x405
created by main.main.func1
    /home/why/code/estuary/main.go:382 +0xc8a

Full trace: http://ipfs.io/ipfs/QmVk3KAf4QCtRDrq5SRRQn5sJVZK5vqvzkY3muBUSiWVDP

dirkmc commented 3 years ago

It looks like the state machine is getting locked up on the client side.

In estuary is there another thread that might be blocking the state machine from progressing? For example if you subscribe to events, the event handler must return before the rest of the state machine can progress.

What version of go-data-transfer are you using?

dirkmc commented 3 years ago

It seems that there is a deadlock in the graphsync transport:

goroutine 36943332 [semacquire, 1394 minutes]:
sync.runtime_SemacquireMutex(0xc00051c4b4, 0x59ee00, 0x1)
    /usr/local/go/src/runtime/sema.go:71 +0x47
sync.(*Mutex).lockSlow(0xc00051c4b0)
    /usr/local/go/src/sync/mutex.go:138 +0x105
sync.(*Mutex).Lock(...)
    /usr/local/go/src/sync/mutex.go:81
sync.(*RWMutex).Lock(0xc00051c4b0)
    /usr/local/go/src/sync/rwmutex.go:111 +0x90
github.com/filecoin-project/go-data-transfer/transport/graphsync.(*Transport).CleanupChannel(0xc00051c480, 0xc00bc57f20, 0x26, 0xc00bc57f80, 0x26, 0x167f4983a2da1a85)
    /home/why/go/pkg/mod/github.com/filecoin-project/go-data-transfer@v1.4.3/transport/graphsync/graphsync.go:309 +0x3d
github.com/filecoin-project/go-data-transfer/impl.(*channelEnvironment).CleanupChannel(0xc000010678, 0xc00bc57f20, 0x26, 0xc00bc57f80, 0x26, 0x167f4983a2da1a85)
    /home/why/go/pkg/mod/github.com/filecoin-project/go-data-transfer@v1.4.3/impl/environment.go:26 +0x55
github.com/filecoin-project/go-data-transfer/channels.cleanupConnection(0x7f1ea01c9e90, 0xc0d210e120, 0x30c1678, 0xc000010678, 0xc00bc57ec0, 0x26, 0x167f4983a2da1a85, 0xc00bc57f20, 0x26, 0xc00bc57f80, ...)
    /home/why/go/pkg/mod/github.com/filecoin-project/go-data-transfer@v1.4.3/channels/channels_fsm.go:213 +0x132
reflect.Value.call(0x294d060, 0x2ed4ca8, 0x13, 0x2c3d2cc, 0x4, 0xc00a729728, 0x3, 0x3, 0x53a1df, 0x2ad54e0, ...)
    /usr/local/go/src/reflect/value.go:476 +0x8e7
reflect.Value.Call(0x294d060, 0x2ed4ca8, 0x13, 0xc0048c7728, 0x3, 0x3, 0x0, 0xe52319c0f89aedc, 0x307c7b80ccd26a8)
    /usr/local/go/src/reflect/value.go:337 +0xb9
github.com/filecoin-project/go-statemachine/fsm.fsmHandler.handler.func1(0xc0d293b1d0, 0x2, 0x2, 0xc0d293b1d0, 0x6bd2cadc80c04709, 0xc00d0dc5bd)
    /home/why/go/pkg/mod/github.com/filecoin-project/go-statemachine@v0.0.0-20200925024713-05bd7c71fbfe/fsm/fsm.go:170 +0x437
reflect.Value.call(0xc003424d20, 0xc0d293b140, 0x13, 0x2c3d2cc, 0x4, 0xc0048c7f70, 0x2, 0x2, 0x53a1df, 0x2a13b20, ...)
    /usr/local/go/src/reflect/value.go:476 +0x8e7
reflect.Value.Call(0xc003424d20, 0xc0d293b140, 0x13, 0xc006a15f70, 0x2, 0x2, 0x5a3bf2baf0a0, 0x431b3e0, 0xc000186580)
    /usr/local/go/src/reflect/value.go:337 +0xb9
github.com/filecoin-project/go-statemachine.(*StateMachine).run.func3(0xc004093370, 0x30c1410, 0xc000120008, 0xc004093420, 0xc004093380, 0xc0023749c0)
    /home/why/go/pkg/mod/github.com/filecoin-project/go-statemachine@v0.0.0-20200925024713-05bd7c71fbfe/machine.go:102 +0x2c9
created by github.com/filecoin-project/go-statemachine.(*StateMachine).run
    /home/why/go/pkg/mod/github.com/filecoin-project/go-statemachine@v0.0.0-20200925024713-05bd7c71fbfe/machine.go:100 +0x372

According to this log the version of go-data-transfer being run is 1.4.3.

I believe the deadlock was fixed by https://github.com/filecoin-project/go-data-transfer/pull/199 on this line: https://github.com/filecoin-project/go-data-transfer/pull/199/files#diff-ef4bcba1f2d11506278385f0dd22312705002145b9c930293ede611798ad4ed9L378

So running v1.5.0 (or master) should fix the issue.