Closed Wondertan closed 10 months ago
Theissue is related to all sublibs we have in LL's org, but AFAIK we don't have an established process on how to create such root issues, so I decided to keep it in the core.
@liamsi, pls attach it to the proper Project. Don't know what to choose
As all this sounds good but it's mostly about optimizations, I've attached this to Testnet. We could even do some of this after mainnet though.
AFAIK we don't have an established process on how to create such root issues, so I decided to keep it in the core.
For similar cases, that span multiple repositories, I've used cards in the projects (e.g. see the first entry in https://github.com/orgs/lazyledger/projects/4 or in https://github.com/orgs/lazyledger/projects/6).
In some hot paths, like whole block processing flow, this surely causes additional pressure on both allocator and GC.
IMO, the first step should be to actually measure this via a profiler and benchmarks to see if that actually has any impact. Especially compared to overall systems latency. I'd expect the main bottleneck will be IO / network comms (it always is). And the gains through low-level optimizations like the mentioned ones are often rather minuscule in the grand scheme of things. We also need to measure network latencies via our ipld experiments (or via that testframework protocol labs developed). I think measuring latencies for retrieving block data as well as sampling da proofs has much higher priority (e.g. see: https://github.com/lazyledger/ipld-plugin-experiments/issues).
rsmt2d is a bit different with this regard, as we are currently refactoring it to streamline the APIs and make it production-ready. So if you have good ideas there, you can sync with @adlerjohn who has recently added benchmarks there.
For the new rsmt2d API we can add something like Close/Destruct/Free and etc semantics for a user to free(get back to the pool) resources once processing ended.
As all this sounds good but it's mostly about optimizations
Totally agree, but It would hurt to file an issue for the future.
IMO, the first step should be to actually measure this via a profiler and benchmarks to see if that actually has any impact.
But measuring differences would require us to implement that first, so it hard to understand the exact numbers of gains without doing actual work here.
I'd expect the main bottleneck will be IO / network comms (it always is)actually to implement that first
This is surely not about bottlenecks but better resource management within the application with predictable gains. Imagine how many times we allocate fixed-sized slices for samples every block when we can reuse them by allocating them once, hiding from GC in sync.Pool
after some processing, and then accessing again when a new block arrives. And there are other cases like this. Unfortunately, we cannot fully manage memory in Go ourselves(btw, I would love to see toggleable GC in Go). Still, we are smart enough to overcome some of the disadvantages of GC.
But measuring differences would require us to implement that first, so it hard to understand the exact numbers of gains without doing actual work here.
Yeah, you are right about measuring the difference of course. I'm just saying that we should first profile how much time and how many allocs happen because of GC (of slices) first.
It's certainly good that you opened an issue about it!
"Premature optimization is the root of all evil". Definitely good place to find some CPU cycles and memory, but I totally support @liamsi on this - we probably can do it even after mainnet.
Is this still relevant? I think @Wondertan went and did this in node anyways?
Moved over to node.
Grooming 12/07/2022:
This could be a part of a later epic describing the RAM optimization roadmap
Context
We allocate plain byte slices extensively throughout the project repos and after some work with them discard allocated slices causing GC to clean them up. In some hot paths, like whole block processing flow, this surely causes additional pressure on both allocator and GC. Fortunately, there is a relatively simple trick to avoid that and reduce stress in general for all data allocations by reusing fixed-sized allocated buffers through
sync.Pool
as basic primitive, though we can rely on already existing libs with simple APIs not to reinvent the wheel.Expected Gains
Options:
https://github.com/libp2p/go-buffer-pool
https://github.com/valyala/bytebufferpool
Places to be altered
pls add more