Closed jwinkler2083233 closed 2 years ago
Interesting. Does it make any difference if you call reset()
before returning it to the pool in EncodeToBytes?
In memory profiles, the 'cumulative' and 'flat' measurements are not related to GC. Rather, what they mean is:
The same is also true for CPU profiles, but instead of memory allocated, it is 'time spent'.
Since allocation happens directly in list
and toBytes
and they do not call other functions that allocate, the flat and cumulative measurements are the same. But this does not mean that memory allocated by these functions can't released by GC. In fact, the reporting of held memory through memory profiles is not super precise because it only reports where allocation happened and not how much of the memory is still held.
I think that, if your load test is about RLP encoding, then it's pretty certain most allocation will happen in package rlp.
You can identify whether there is a memory leak by observing the memory held by your process. If the memory is steadily increasing while your load test runs, there is a leak. If possible, please provide a link to your test code, so that we are able to reproduce your issue.
btw, I think adding a sync.Pool for listhead
will not help because list headers are not allocated on their own. The slice of list headers in rlp.encbuf
contains them as values and not pointers, so the allocation you are seeing in the profile is allocation of the slice's backing array when it is extended by append
.
If you allocate the listhead
value in sync.Pool and then pass the pooled value to append, you are moving the allocation to another function (i.e. 'flat' and 'cumulative' will not match anymore), but it ultimately just allocates more.
One thing that should be noted is that rlp.encbuf
does not have any limits on the size of objects that will be returned to the pool. What I mean is: when you encode a large object where the output contains many RLP lists, the encbuf
list header slice will be very long at the end of the encoding process. encbuf
will be returned to the pool with the full slice length. When subsequent encoding operations reuse the buffer, they truncate the len
of the list header slice to zero, but leave the cap
as-is, so the backing array will never shrink again.
I decided on this behavior because it is not trivial to determine when and by how much these internal buffers should shrink. For go-ethereum's RLP encoding needs, this has never been a problem because our encoding operations only create at most single-digit megabytes of output. Also, GC will occasionally empty the pool and thus remove the large buffers. If your use of package rlp involves heavy concurrent encoding with very large outputs, then I would not be surprised to see high memory use. It still doesn't mean there is a leak, but if you are constantly encoding, GC will not empty the pool and those large buffers can stick around for a while.
Seems to a non-issue. Please reopen if it's indeed an issue here somewhere
Thank you for the comments. I'll have to create a minimum example to prove my point.
Yeah, closing the ticket was a bit premature. I'm happy to check out your example!
This is just a draft, but it's closer to how I'd do this in native code (C++). I noticed a recent change that added an encbuffer.go file and an 'internal' directory. I haven't integrated those changes yet:
My tests are showing a 1.5% speed improvement, along with a drop in heap usage. Your mileage may vary..
I'm still wondering how the 'append' method would increase the backing store by so much over time, for the original version. When I modified it to store pointers instead, it dropped to zero.
Let me know if you like it, and I'll put together a formal PR. thx
Hmm, so now you are talking about optimizations though.
This issue was originally about a 'leak'. Can you provide a code example (it can be your entire application) that exhibits a memory leak related to package rlp? And by leak, I mean unbounded heap growth.
I'm still wondering how the 'append' method would increase the backing store by so much over time, for the original version. When I modified it to store pointers instead, it dropped to zero.
I would like to know how you arrive at this conclusion ("dropped to zero"). append
will always allocate if the existing slice capacity is not sufficient to hold another item. It doesn't matter whether the slice holds pointers or not. Specifically, append
in Go grows slices by doubling their capacity, i.e. append(slice, x)
is equivalent to
func append(slice []X, x X) []X {
if len(slice) == cap(slice) {
newslice := make([]X, cap(slice)*2)
copy(newslice, slice)
slice = newslice[:len(slice)]
}
end := len(slice)
slice = slice[:end+1]
slice[end] = x
return slice
}
Please provide additional data about your use of package rlp. For example, you could sample and collect the list header slice len
and cap
at the end of encoding and then provide the data here.
Exactly. I'm going by what the profiler is reporting. When I changed 'list()' to use pointers, the apparent memory leak went away, according to pprof. By using pointers in the 'lheads' slice, I was able to store/reuse pointers to 'listhead' objects that were retrieved from sync.Pool. What's odd is that the pointers should have taken up the same amount of space as the list head object (8 bytes for a 64-bit pointer), and the pointers were allocated via append(), but the memory they used didn't even register with pprof.
It's possible that the garbage collector has something to do with this. Maybe it's tuned to skip over the 'listhead'-style allocations because they're values.
The impetus for this is that our servers are overloaded with billions of small allocations, and then the garbage collection eventually takes up > 30% of CPU.
Here's the latest before and after comparison.
Before:
(pprof) Showing nodes accounting for 96635006, 82.81% of 116700167 total
Dropped 989 nodes (cum <= 583500)
Showing top 100 nodes out of 226
flat flat% sum% cum cum%
28836259 24.71% 24.71% 28836259 24.71% [github.com/ethereum/go-ethereum/rlp.(*encbuf](http://github.com/ethereum/go-ethereum/rlp.(*encbuf)).list (inline)
4239344 3.63% 63.58% 4239344 3.63% [github.com/ethereum/go-ethereum/rlp.(*encbuf](http://github.com/ethereum/go-ethereum/rlp.(*encbuf)).toBytes
After:
(pprof) Showing nodes accounting for 1713208044, 82.77% of 2069818576 total
Dropped 1757 nodes (cum <= 10349092)
Showing top 100 nodes out of 183
flat flat% sum% cum cum%
128612311 6.21% 58.03% 132281690 6.39% github.com/onflow/flow-go/model/flow/rlp.EncodeToBytes
As you can see, the percentage dropped from (24.7 + 3.6 = 28.3%), to 6.4%
What's odd is that the pointers should have taken up the same amount of space as the list head object (8 bytes for a 64-bit pointer),
A listhead is a struct with two ints, so 16 bytes
. The pointer is 8
bytes. So a pointer is only half the size. And if you pool the listheads, avoiding those allocs, then I'm not surprised you see a drop in allocs. Am I missing something?
I still don't understand this issue.
As mentioned earlier, I think our existing pooling strategy should cover this: When encoding, a encBuffer
instance is taken from the pool, then used, growing encBuffer.lheads
. After encoding, the buffer instance is returned back into the pool. When encoding is heavily used, all pooled encBuffer
instances will have reasonably-sized lheads
and no allocation should happen at all.
My own profiling of geth itself confirmed this in the past, we don't usually allocate much in encBuffer.list
. However, in the profile shown here, there is significant allocation in this method.
@jwinkler2083233 What I would really like to see is a report of encBuffer
sizes before and after encoding.
To get this, you could apply this change for example:
https://github.com/fjl/go-ethereum/commit/a51db444cf38d1347c448956579f81dd0db17b84
And then observe the metrics somehow.
One more thing:
On the go-ethereum master branch, the recent changes in package rlp permit doing manual encoder buffer management, like this:
var (
buf = rlp.NewEncoderBuffer(nil)
encbytes []byte
)
for {
buf.Reset(nil)
err := rlp.Encode(buf, value)
if err != nil {
// handle it
}
encbytes = buf.AppendToBytes(encbytes[:0])
// here, encbytes contains the RLP encoding of value
}
Using it like this will for sure allocate less.
System information
OS & Version: Linux Commit hash : latest
Expected behaviour
I would expect RLP allocations to be garbage collected after a few hours of heavy use.
Actual behaviour
'pprof' is reporting that the 'encbuf.list()' function is allocating memory, and that memory is not getting garbage collected.
I verified that the allocation done by the only 'append' call in the 'encbuf.list()' method in 'rlp/encode.go' could be eliminated via the use of an additional 'sync.Pool' for 'listhead' objects. That worked great, I can submit the change if desired.
Steps to reproduce the behaviour
I enabled memory profiling in my application. My application makes heavy use of the 'EncodeToBytes()' public method I ran a three-hour load test, to see what the GC did, and then ran a memory profile.
Here is the relevant data from pprof:
The 'toBytes' function doesn't appear to be GC'd either. I don't have a workaround for that one. The reason I don't think the GC accessed these objects is because the 'cumulative' value from the 'pprof' output above matches the 'flat' output. That is not the case for other, smaller, allocations in my test. The GC worked normally for other objects. I've run this test several times as well as multiple variations, and I'm seeing similar results, where the 'cumulative' value matches the 'flat' column.
My theory is that the 'encbuf.reset()' method is somehow creating a shortcut and the sync.Pool loses track of the allocations. The other alternative I can think of is that there is a bug in pprof, but I doubt that.
Backtrace
When submitting logs: please submit them as text and not screenshots.