Closed Wondertan closed 2 years ago
Todos are self reminders. For now I just elaborately explained the issue and high-level solution. More implementation details are yet to come.
While it was likely implied, just to be clear, unless @jclyons52 wants to work on this we should probably put off doing this until after MVP. That being said, it does seem like there could be some significant benefits to be had.
In the worst case with the max block 128X128 that is 16384 goroutines. This causes the race detector to stop working in tests, as it has a limit to 8k routines.
This is true! It is also true that when running the test in celestiaorg/celestia-core#424, with a limit of 512 goroutines, that we still cause the race detector to complain. That's what led me to believe that the calling dag.Get
spun up quite a few goroutines. It could also be that the the semaphore code isn't working the way I think it is, and some goroutines are kept alive.
Having redundant walks sounds bad, and we should avoid those where possible, but are idle goroutines that bad? The standard library opens up a routine per http request, so us doing something similar doesn't strike me as terrible. It would be nice if we had a more efficient system, but I'm not sure it's worth the complexity and effort. Especially considering our near future plans, where very few nodes will be downloading the entire block using IPFS.
Instead of spawning goroutine per share, we should spawn a routine per tree we need to walk through, particularly per every DAHeader root. This way, we don't have any competing routines and only one routine initiates all ther roundtrips.
I might be misunderstanding this. So, we would still retrieve each share concurrently, but leave the concurrency to IPFS? Is competing == blocking?
Having redundant walks sounds bad, and we should avoid those where possible, but are idle goroutines that bad?
Just to note that I am not rushing with implementing this issue.
Competing, in this case, is useless and we need to avoid it.
Yeah, I guess this make sense here. I think we previously just accepted that there will be a lot of overlap for the inner nodes per "leaf-request" and we can certainly make better use of the tree structure. Looking forward to read the filled out Implementation Details.
but leave the concurrency to IPFS
Let's increase the correctness of our statements by mentioning specific IPFS subsystem instead of IPFS as a whole as a black box. In this case, we tell Bitswap and even more precisely IPLDv0 with Bitswap.
So, we would still retrieve each share concurrently, but leave the concurrency to IPFS?
Yep.
Is competing == blocking?
I would say unblocking instead, specifically routines compete to unblock themselves. The code analogy here would be a native Go's chan from which many routines wait/compete for a result to read. The one who gets a chance to read the result unpacks it and sends it to request chan, while others do nothing. After, waiting starts again till a new result value appears.
@Wondertan can we close or move this issue?
@evan-forbes, Can you move it?
Or let me do it
@vgonkivs just tagging you here for reference:
When you join, we think it would be cool for you to work on this issue as you already have experience with the ipld
pkg.
Summary
Current RetrieveBlockData implementation spawns a goroutine per each share in ExtendedDataSquare. In the worst case with the max block 128X128 that is 16384 goroutines. This causes the race detector to stop working in tests, as it has a limit to 8k routines. Furthermore, this a lot of routines for a single operation, and in fact most of the time they are idle.
Why routines are idle. Shares in a block are addressed and committed with multiple Merkle Trees, so when we request a whole block we need to walk through all the trees. The walking step is: request blob of data by its hash(1), unpack block(2), check if we unpacked more hashes or a share(3), and proceed with unpacked hashes onto the next step recursively(3). Those steps are executed until no more hashes left and we have all the shares. From this, we can see that every walking step is a full roundtrip - network request, unpack, request again, and so on. Now, imagine those 16384 goroutines which walk down the same trees. In practice, those routines mostly wait for every single roundtrip to finish and then they compete to initiate the next roundtrip. Competing, in this case, is useless and we need to avoid it.
Instead of spawning goroutine per share, we should spawn a routine per tree we need to walk through, particularly per every DAHeader root. This way, we don't have any competing routines and only one routine initiates all ther roundtrips. In numbers, for the largest block that would be 128 routines per Row DAHeader root and 256 routines if want to fetch and store all the inner nodes of trees. NOTE: All the shares are addressed and committed twice, with two Row and Column Merkle Trees, thus it is not required to traverse both trees to fetch the shares, but again, if we need to store all inner nodes of both trees(cc @adlerjohn) than both tree should be traversed.
Implementation Details
// TODO
Action Items
// TODO
References
As already mentioned, the original spark of the issue is the race detector complaining about exceeding the limit of 8k routines. The solution for that was to detect if the race detector running and simply skipping the test :grimacing:. Later, after some discussions, https://github.com/celestiaorg/lazyledger-core/issues/357 was created. And the solution for it is being implemented here https://github.com/celestiaorg/lazyledger-core/pull/424.
Also, supersede https://github.com/celestiaorg/lazyledger-core/issues/278
Appendix
There is a plan to remove DAHeader and use only one root hash to address and commit to all the shares. In such a case, we would not need to spawn any routines at all. The routine calling the RetrieveBlockData needs to be blocking anyway and it can then take care of initiating all the round trips.