djimenez / iconv-go

iconv support for Go
BSD 2-Clause "Simplified" License
418 stars 66 forks source link

Memory Leak on NewReader method #33

Open ghost opened 7 years ago

ghost commented 7 years ago

I am making a bunch of web requests and am converting the bodies to utf-8 (the input encoding will vary for each page). There seems to be a memory leak and the following code will get to 1GB of memory after about 4-5 minutes. Using the ConvertString method, however, results in no such leak. My best guess is that the converter is not being closed when using the NewReader method? Thanks for this package, btw!

package main

import (
    "bytes"
    "context"
    "io/ioutil"
    "log"
    "net/http"
    "sync"
    "time"
    iconv "github.com/djimenez/iconv-go"
)

var h = `<html><head></head><body></body></html>`

func job() {
    time.Sleep(1 * time.Second) // simulate web request
    bdy := ioutil.NopCloser(bytes.NewReader([]byte(h)))
    defer bdy.Close()

    resp := &http.Response{
        Body: bdy,
    }
    defer resp.Body.Close()

    _, err := iconv.NewReader(resp.Body, "windows-1252", "utf-8") // memory leak
    //_, err := iconv.ConvertString(h, "windows-1252", "utf-8") // no memory leak
    if err != nil {
        log.Fatalln(err)
    }
}

func main() {
    ctx, cancel := context.WithTimeout(context.Background(), 5*time.Minute)
    defer cancel()

    n := 1000
    stop := make(chan bool)

    wg := sync.WaitGroup{}
    for worker := 0; worker < n; worker++ {
        wg.Add(1)
        go func(stop chan bool) {
            defer wg.Done()
            for {
                select {
                case <-stop:
                    return
                default:
                    job()
                }
            }
        }(stop)
    }

    <-ctx.Done()

    for i := 0; i < n; i++ {
        stop <- true
    }
    wg.Wait()

}
djimenez commented 7 years ago

I think I had expectations when I wrote the code that the Converter's destroy method was automatically called before it was garbaged collected. Looking at some old golang groups posts, there was apparently some wrong documentation in cgo examples. I could correct this to use a finalizer, but it'd probably be better to just add the Close method to our reader and close it there.

ghost commented 7 years ago

I'm not too familiar with cgo so I'm afraid I'm of no use on that end. I did a quick test that seems to work:

package main

import (
    "bytes"
    "context"
    "io"
    "io/ioutil"
    "log"
    "net/http"
    "sync"
    "time"

    iconv "github.com/djimenez/iconv-go"
)

var h = `<html><head></head><body></body></html>`

// NewReader is modified to close the converter
func NewReader(source io.Reader, fromEncoding string, toEncoding string) (*iconv.Reader, error) {
    // create a converter
    converter, err := iconv.NewConverter(fromEncoding, toEncoding)

    if err == nil {
        reader := iconv.NewReaderFromConverter(source, converter)
        converter.Close()
        return reader, err
    }

    // return the error
    return nil, err
}

func job() {
    time.Sleep(1 * time.Second) // simulate web request
    bdy := ioutil.NopCloser(bytes.NewReader([]byte(h)))
    defer bdy.Close()

    resp := &http.Response{
        Body: bdy,
    }
    defer resp.Body.Close()

    //_, err := iconv.NewReader(resp.Body, "windows-1252", "utf-8") // memory leak
    //_, err := iconv.ConvertString(h, "windows-1252", "utf-8") // no memory leak
    _, err := NewReader(resp.Body, "windows-1252", "utf-8")
    if err != nil {
        log.Fatalln(err)
    }
}

func main() {
    ctx, cancel := context.WithTimeout(context.Background(), 5*time.Minute)
    defer cancel()

    n := 1000
    stop := make(chan bool)

    wg := sync.WaitGroup{}
    for worker := 0; worker < n; worker++ {
        wg.Add(1)
        go func(stop chan bool) {
            defer wg.Done()
            for {
                select {
                case <-stop:
                    return
                default:
                    job()
                }
            }
        }(stop)
    }

    <-ctx.Done()

    for i := 0; i < n; i++ {
        stop <- true
    }
    wg.Wait()

}

I imagine that NewWriter would also need the same but I haven't used that. Also, thanks for quick response!

moris351 commented 7 years ago

I have same issue, you can modify the iconv-go source code, I add

 func (this *Reader) Close(){
      this.converter.Close()
 }

in reader.go , then in your code, call it like this:

utfBody, err := iconv.NewReader(resp.Body, "windows-1252", "utf-8")
  if err != nil {
           // handler error
  }
utfbody.Close()

the memory leak stopped

Anyway, many thanks to the author, this lib is great

ghost commented 7 years ago

They both achieve the same thing. I think my solution more closely matches what has been done in the rest of the package. Just need to replace NewReader with mine.

ghost commented 7 years ago

35 will fix this issue.

moris351 commented 7 years ago

you should not close the converter in NewReader, the converter will not be available to Reader.Read, you just can close the converter after the "reader" using.

the #35 is a wrong solution

the correct solution is add

 func (this *Reader) Close(){
      this.converter.Close()
 }
ghost commented 7 years ago

pls make pull request ;)

djimenez commented 7 years ago

Can you take a look at #36 - its addresses more than this issue, but should run a finalizer on converters created by readers and writers. I may split this change up.