go-pg / pg

Golang ORM with focus on PostgreSQL features and performance
https://pg.uptrace.dev/
BSD 2-Clause "Simplified" License
5.65k stars 401 forks source link

Optimize memory allocation in appendRune function #1988

Closed MateusVeloso closed 10 months ago

MateusVeloso commented 10 months ago

This pull request addresses a potential memory leak in the appendRune function of the go-pg library. The current implementation of the function creates a new byte slice every time the capacity of the current slice is insufficient to store the new rune. This can lead to a large number of memory allocations, especially when the function is called with large strings.

The proposed change modifies the appendRune function to reuse the already allocated memory as much as possible, instead of always allocating new memory. This is achieved by doubling the capacity of the existing slice when the current capacity is insufficient.

This change should help mitigate the memory leak issue and improve the performance of applications using the go-pg library.

Here is the modified version of the appendRune function:

func appendRune(b []byte, r rune) []byte {
    if r < utf8.RuneSelf {
        return append(b, byte(r))
    }
    l := len(b)
    if cap(b)-l < utf8.UTFMax {
        nb := make([]byte, l, 2*l+utf8.UTFMax)
        copy(nb, b)
        b = nb
    }
    n := utf8.EncodeRune(b[l:l+utf8.UTFMax], r)
    return b[:l+n]
}

This PR resolves #1987

elliotcourant commented 10 months ago

From my understanding of the append rune function, this would only improve memory allocation when you are frequently appending runes that are multiple bytes by pre-allocating some space for those runes. But if you are infrequently doing this then you would be over-allocating.

The current code allocates exactly what is needed but would do it far more frequently (and potentially inefficiently if you are frequently using multi-byte runes), the new code solves this but I don't interpret it as a fix but more as a tradeoff for a specific use case?

Can you provide more details or tests that demonstrate the memory behavior that you were observing with the old code and how those are resolved with the new approach?

elliotcourant commented 10 months ago

I would be particularly interested in a unit test for this function that could be run on the primary branch and this one with a memprofile so that we could compare.

MateusVeloso commented 10 months ago

Absolutely, I understand the importance of providing empirical evidence to support this change. I have set up a unit test for the appendRune function that can be run with a memory profiler on both the primary branch and this one. This allows us to compare the memory usage of both implementations.

The benchmark test was designed to stress the appendRune function by repeatedly appending a multi-byte rune to a byte slice. The test was run with a large number of iterations to simulate a high-load scenario and make any memory leak more evident.

Here are the results of the benchmark test:

As you can see, the new implementation of the appendRune function allocates significantly less memory per operation (8.45 MB vs 16.79 MB) and makes fewer distinct allocations (12 vs 30). This suggests that the new implementation is more efficient in terms of memory allocation, which can help mitigate the memory leak issue.

I believe these results demonstrate the effectiveness of the proposed change in improving the performance of the appendRune function. I hope this addresses your concerns, and I'm happy to provide any further information or make any additional changes as needed.

@elliotcourant

elliotcourant commented 10 months ago

@MateusVeloso outstanding! Thank you very much for putting that together, I'm gonna play around with it locally a bit tonight with non-multibyte runes and look at a difference in allocation there as well. But so far this looks good to me!

Out of curiosity, was this discovered due to a use case in your app with primarily multi-byte runes from user input? Names, descriptions, metadata etc?

MateusVeloso commented 10 months ago

@elliotcourant I discovered this issue while working on my application, which handles a significant amount of user input, including multi-byte runes in names, descriptions, metadata, etc.

I noticed an unusual memory usage pattern that led me to suspect a potential memory leak. To investigate further, I utilized the go tool pprof.

Upon analysis, I found that the appendRune function was a significant contributor to the memory usage. Given the frequency and volume of data my application processes, optimizing this function seemed like a promising avenue for improving overall performance and efficiency.

I'm glad to hear that the changes look good to you so far. I'm confident that they will prove beneficial not only for my application but also for other developers using the go-pg library. I look forward to hearing about your experiences testing it with non-multibyte runes.

elliotcourant commented 10 months ago

@MateusVeloso I'm fine to merge this, would you mind removing the benchmark for the "old" version and the old function? But leave the benchmark around the new one in?

MateusVeloso commented 10 months ago

@elliotcourant
Absolutely, I appreciate the feedback! I'll go ahead and remove the benchmark for the old version and the old function as suggested.

MateusVeloso commented 10 months ago

@elliotcourant I've removed the old benchmark