ethersphere / bee

Bee is a Swarm client implemented in Go. It’s the basic building block for the Swarm network: a private; decentralized; and self-sustaining network for permissionless publishing and access to your (application) data.
https://www.ethswarm.org
BSD 3-Clause "New" or "Revised" License
1.44k stars 338 forks source link

2.0.0 Pin Deadlock? #4624

Closed ldeffenb closed 3 months ago

ldeffenb commented 3 months ago

Context

bee 2.0.0 on mainnet - Actually running my forked branch at https://github.com/ldeffenb/bee/tree/2.0.0-prerelease-hacks

Summary

I'm working on re-pinning missing chunks from the swarm into my OSM dataset. I do parallel pins from a TypeScript script. After successfully pinning 3 references, my scripts are hanging with outstanding pin requests that are not being satisfied.

Expected behavior

Pin requests should either succeed or fail, they shouldn't just hang out.

Actual behavior

Several pin requests are hanging. Here's the pin-related sections of a /debug/pprof/goroutine, but your line numbers might not line up because I have some logging hack in my pin services (see github fork link above). I'll see if I can find the logs that go along with these pending pins as well.

3 @ 0x43eb0e 0x44ef85 0x101033e 0xa3d715 0x100eb11 0x1107a2e 0x11a3d85 0x83ca89 0x10626cc 0x1062565 0x114e945 0x1186e8a 0x83ca89 0x11d1d12 0x83ca89 0x11d1baf 0x83ca89 0x11d055f 0x83ca89 0x11cfdfb 0x83ca89 0x11b2cfa 0x83ca89 0x83f76e 0x83b654 0x471f81
#   0x101033d   github.com/ethersphere/bee/v2/pkg/storer.(*DB).Download.func1+0x5dd                             github.com/ethersphere/bee/v2/pkg/storer/netstore.go:106
#   0xa3d714    github.com/ethersphere/bee/v2/pkg/storage.GetterFunc.Get+0x34                                   github.com/ethersphere/bee/v2/pkg/storage/chunkstore.go:60
#   0x100eb10   github.com/ethersphere/bee/v2/pkg/storer.getterWithMetrics.Get+0xb0                             github.com/ethersphere/bee/v2/pkg/storer/metrics.go:203
#   0x1107a2d   github.com/ethersphere/bee/v2/pkg/traversal.(*service).Traverse+0x10d                               github.com/ethersphere/bee/v2/pkg/traversal/traversal.go:58
#   0x11a3d84   github.com/ethersphere/bee/v2/pkg/api.(*Service).pinRootHash+0x484                              github.com/ethersphere/bee/v2/pkg/api/pin.go:66
#   0x83ca88    net/http.HandlerFunc.ServeHTTP+0x28                                             net/http/server.go:2136
#   0x10626cb   github.com/ethersphere/bee/v2/pkg/jsonhttp.HandleMethods+0x12b                                  github.com/ethersphere/bee/v2/pkg/jsonhttp/handlers.go:25
#   0x1062564   github.com/ethersphere/bee/v2/pkg/jsonhttp.MethodHandler.ServeHTTP+0x44                             github.com/ethersphere/bee/v2/pkg/jsonhttp/handlers.go:17
#   0x114e944   github.com/gorilla/mux.(*Router).ServeHTTP+0x1c4                                        github.com/gorilla/mux@v1.8.0/mux.go:210
#   0x1186e89   github.com/ethersphere/bee/v2/pkg/api.(*Service).corsHandler.func1+0x4c9                            github.com/ethersphere/bee/v2/pkg/api/api.go:646
#   0x83ca88    net/http.HandlerFunc.ServeHTTP+0x28                                             net/http/server.go:2136
#   0x11d1d11   github.com/ethersphere/bee/v2/pkg/api.(*Service).pageviewMetricsHandler-fm.(*Service).pageviewMetricsHandler.func1+0x91     github.com/ethersphere/bee/v2/pkg/api/metrics.go:89
#   0x83ca88    net/http.HandlerFunc.ServeHTTP+0x28                                             net/http/server.go:2136
#   0x11d1bae   github.com/ethersphere/bee/v2/pkg/api.(*Service).responseCodeMetricsHandler-fm.(*Service).responseCodeMetricsHandler.func1+0x6e github.com/ethersphere/bee/v2/pkg/api/metrics.go:97
#   0x83ca88    net/http.HandlerFunc.ServeHTTP+0x28                                             net/http/server.go:2136
#   0x11d055e   github.com/ethersphere/bee/v2/pkg/api.(*Service).MountAPI.func1.1.CompressHandler.CompressHandlerLevel.func1+0x69e      github.com/gorilla/handlers@v1.4.2/compress.go:148
#   0x83ca88    net/http.HandlerFunc.ServeHTTP+0x28                                             net/http/server.go:2136
#   0x11cfdfa   github.com/ethersphere/bee/v2/pkg/api.(*Service).MountAPI.func1.1+0xda                              github.com/ethersphere/bee/v2/pkg/api/router.go:93
#   0x83ca88    net/http.HandlerFunc.ServeHTTP+0x28                                             net/http/server.go:2136
#   0x11b2cf9   github.com/ethersphere/bee/v2/pkg/api.(*Service).MountAPI.NewHTTPAccessLogHandler.func2.1+0x139                 github.com/ethersphere/bee/v2/pkg/log/httpaccess/http_access.go:41
#   0x83ca88    net/http.HandlerFunc.ServeHTTP+0x28                                             net/http/server.go:2136
#   0x83f76d    net/http.serverHandler.ServeHTTP+0x8d                                               net/http/server.go:2938
#   0x83b653    net/http.(*conn).serve+0x5f3    
3 @ 0x43eb0e 0x44ffb8 0x44ff8f 0x46de25 0x48fa48 0x11a3da5 0x83ca89 0x10626cc 0x1062565 0x114e945 0x1186e8a 0x83ca89 0x11d1d12 0x83ca89 0x11d1baf 0x83ca89 0x11d055f 0x83ca89 0x11cfdfb 0x83ca89 0x11b2cfa 0x83ca89 0x83f76e 0x83b654 0x471f81
#   0x46de24    sync.runtime_Semacquire+0x24                                                    runtime/sema.go:62
#   0x48fa47    sync.(*WaitGroup).Wait+0x47                                                 sync/waitgroup.go:116
#   0x11a3da4   github.com/ethersphere/bee/v2/pkg/api.(*Service).pinRootHash+0x4a4                              github.com/ethersphere/bee/v2/pkg/api/pin.go:113
#   0x83ca88    net/http.HandlerFunc.ServeHTTP+0x28                                             net/http/server.go:2136
#   0x10626cb   github.com/ethersphere/bee/v2/pkg/jsonhttp.HandleMethods+0x12b                                  github.com/ethersphere/bee/v2/pkg/jsonhttp/handlers.go:25
#   0x1062564   github.com/ethersphere/bee/v2/pkg/jsonhttp.MethodHandler.ServeHTTP+0x44                             github.com/ethersphere/bee/v2/pkg/jsonhttp/handlers.go:17
#   0x114e944   github.com/gorilla/mux.(*Router).ServeHTTP+0x1c4                                        github.com/gorilla/mux@v1.8.0/mux.go:210
#   0x1186e89   github.com/ethersphere/bee/v2/pkg/api.(*Service).corsHandler.func1+0x4c9                            github.com/ethersphere/bee/v2/pkg/api/api.go:646
#   0x83ca88    net/http.HandlerFunc.ServeHTTP+0x28                                             net/http/server.go:2136
#   0x11d1d11   github.com/ethersphere/bee/v2/pkg/api.(*Service).pageviewMetricsHandler-fm.(*Service).pageviewMetricsHandler.func1+0x91     github.com/ethersphere/bee/v2/pkg/api/metrics.go:89
#   0x83ca88    net/http.HandlerFunc.ServeHTTP+0x28                                             net/http/server.go:2136
#   0x11d1bae   github.com/ethersphere/bee/v2/pkg/api.(*Service).responseCodeMetricsHandler-fm.(*Service).responseCodeMetricsHandler.func1+0x6e github.com/ethersphere/bee/v2/pkg/api/metrics.go:97
#   0x83ca88    net/http.HandlerFunc.ServeHTTP+0x28                                             net/http/server.go:2136
#   0x11d055e   github.com/ethersphere/bee/v2/pkg/api.(*Service).MountAPI.func1.1.CompressHandler.CompressHandlerLevel.func1+0x69e      github.com/gorilla/handlers@v1.4.2/compress.go:148
#   0x83ca88    net/http.HandlerFunc.ServeHTTP+0x28                                             net/http/server.go:2136
#   0x11cfdfa   github.com/ethersphere/bee/v2/pkg/api.(*Service).MountAPI.func1.1+0xda                              github.com/ethersphere/bee/v2/pkg/api/router.go:93
#   0x83ca88    net/http.HandlerFunc.ServeHTTP+0x28                                             net/http/server.go:2136
#   0x11b2cf9   github.com/ethersphere/bee/v2/pkg/api.(*Service).MountAPI.NewHTTPAccessLogHandler.func2.1+0x139                 github.com/ethersphere/bee/v2/pkg/log/httpaccess/http_access.go:41
#   0x83ca88    net/http.HandlerFunc.ServeHTTP+0x28                                             net/http/server.go:2136
#   0x83f76d    net/http.serverHandler.ServeHTTP+0x8d                                               net/http/server.go:2938
#   0x83b653    net/http.(*conn).serve+0x5f3    
2 @ 0x43eb0e 0x44ef85 0x101033e 0xa3d715 0x100eb11 0x11a5052 0x471f81
#   0x101033d   github.com/ethersphere/bee/v2/pkg/storer.(*DB).Download.func1+0x5dd     github.com/ethersphere/bee/v2/pkg/storer/netstore.go:106
#   0xa3d714    github.com/ethersphere/bee/v2/pkg/storage.GetterFunc.Get+0x34           github.com/ethersphere/bee/v2/pkg/storage/chunkstore.go:60
#   0x100eb10   github.com/ethersphere/bee/v2/pkg/storer.getterWithMetrics.Get+0xb0     github.com/ethersphere/bee/v2/pkg/storer/metrics.go:203
#   0x11a5051   github.com/ethersphere/bee/v2/pkg/api.(*Service).pinRootHash.func1.1+0x2d1  github.com/ethersphere/bee/v2/pkg/api/pin.go:94
2 @ 0x43eb0e 0x44ffb8 0x44ff8f 0x46df45 0xfe71d5 0xfe71d6 0xa3d715 0x1010ec2 0xa3d715 0x100eb11 0x101008c 0xa3d715 0x100eb11 0x11a5052 0x471f81
#   0x46df44    sync.runtime_SemacquireRWMutexR+0x24                            runtime/sema.go:82
#   0xfe71d4    sync.(*RWMutex).RLock+0x1b4                             sync/rwmutex.go:71
#   0xfe71d5    github.com/ethersphere/bee/v2/pkg/storer/internal/cache.(*Cache).Getter.func1+0x1b5 github.com/ethersphere/bee/v2/pkg/storer/internal/cache/cache.go:183
#   0xa3d714    github.com/ethersphere/bee/v2/pkg/storage.GetterFunc.Get+0x34               github.com/ethersphere/bee/v2/pkg/storage/chunkstore.go:60
#   0x1010ec1   github.com/ethersphere/bee/v2/pkg/storer.(*DB).Download.func1.(*DB).Lookup.func3+0xe1   github.com/ethersphere/bee/v2/pkg/storer/cachestore.go:76
#   0xa3d714    github.com/ethersphere/bee/v2/pkg/storage.GetterFunc.Get+0x34               github.com/ethersphere/bee/v2/pkg/storage/chunkstore.go:60
#   0x100eb10   github.com/ethersphere/bee/v2/pkg/storer.getterWithMetrics.Get+0xb0         github.com/ethersphere/bee/v2/pkg/storer/metrics.go:203
#   0x101008b   github.com/ethersphere/bee/v2/pkg/storer.(*DB).Download.func1+0x32b         github.com/ethersphere/bee/v2/pkg/storer/netstore.go:95
#   0xa3d714    github.com/ethersphere/bee/v2/pkg/storage.GetterFunc.Get+0x34               github.com/ethersphere/bee/v2/pkg/storage/chunkstore.go:60
#   0x100eb10   github.com/ethersphere/bee/v2/pkg/storer.getterWithMetrics.Get+0xb0         github.com/ethersphere/bee/v2/pkg/storer/metrics.go:203
#   0x11a5051   github.com/ethersphere/bee/v2/pkg/api.(*Service).pinRootHash.func1.1+0x2d1      github.com/ethersphere/bee/v2/pkg/api/pin.go:94

Steps to reproduce

Attempt to pin references that are not necessarily local to the node. My script is rummaging the swarm to find chunks owned by the OSM batch and then attempts to pin them locally. Some of these references are multi-chunk BMTs with some intermediate chunks not present. I don't know if that is the case on these hung pin requests, but it is distinctly possible. The 3 pins that actually succeeded since upgraded to 2.0.0 were single chunk references, likely mantaray nodes.

Possible solution

Wish I had one, but I wanted to get this report started ASAP.

ldeffenb commented 3 months ago

The full goroutines page:

issue-4624-goroutines.txt

ldeffenb commented 3 months ago

And the full goroutines stackdump page:

issue-4624-full-goroutine-stackdump.txt

ldeffenb commented 3 months ago

Extract of my hacked-version verbosity 5 logs for pin and the 3 references that don't seem to have completed at the end.

issue-4624-pin.log

ldeffenb commented 3 months ago

Actually, it is quite possible this deadlock only exists in my fork. I just found this bad merge that duplicated the locking in the cache putter. Granted, the defers should unlock it, but I'm not sure you can lock either of these locks twice from a single goroutine.

https://github.com/ldeffenb/bee/blob/bbd504f82bc95a47f6c89018f93e1898aae22f94/pkg/storer/internal/cache/cache.go#L114

ldeffenb commented 3 months ago

Looks like this was my own stupid merge misteak (sic). Closing unless it happens again now that I've removed my duplicated locking block.