Closed ldeffenb closed 3 months ago
The full goroutines page:
And the full goroutines stackdump page:
Extract of my hacked-version verbosity 5 logs for pin and the 3 references that don't seem to have completed at the end.
Actually, it is quite possible this deadlock only exists in my fork. I just found this bad merge that duplicated the locking in the cache putter. Granted, the defers should unlock it, but I'm not sure you can lock either of these locks twice from a single goroutine.
Looks like this was my own stupid merge misteak (sic). Closing unless it happens again now that I've removed my duplicated locking block.
Context
bee 2.0.0 on mainnet - Actually running my forked branch at https://github.com/ldeffenb/bee/tree/2.0.0-prerelease-hacks
Summary
I'm working on re-pinning missing chunks from the swarm into my OSM dataset. I do parallel pins from a TypeScript script. After successfully pinning 3 references, my scripts are hanging with outstanding pin requests that are not being satisfied.
Expected behavior
Pin requests should either succeed or fail, they shouldn't just hang out.
Actual behavior
Several pin requests are hanging. Here's the pin-related sections of a
/debug/pprof/goroutine
, but your line numbers might not line up because I have some logging hack in my pin services (see github fork link above). I'll see if I can find the logs that go along with these pending pins as well.Steps to reproduce
Attempt to pin references that are not necessarily local to the node. My script is rummaging the swarm to find chunks owned by the OSM batch and then attempts to pin them locally. Some of these references are multi-chunk BMTs with some intermediate chunks not present. I don't know if that is the case on these hung pin requests, but it is distinctly possible. The 3 pins that actually succeeded since upgraded to 2.0.0 were single chunk references, likely mantaray nodes.
Possible solution
Wish I had one, but I wanted to get this report started ASAP.