dop251 / goja_nodejs

Nodejs compatibility library for Goja
MIT License
326 stars 78 forks source link

Event loop leaks intervals #81

Closed gbl08ma closed 1 month ago

gbl08ma commented 1 month ago

Consider the following case:

package main

import (
    "fmt"
    "runtime"
    "time"

    "github.com/dop251/goja"
    "github.com/dop251/goja_nodejs/eventloop"
)

func main() {
    for {
        loop := eventloop.NewEventLoop()
        loop.Start()
        interval := loop.SetInterval(func(vm *goja.Runtime) {}, 10*time.Millisecond)
        time.Sleep(500 * time.Millisecond)
        loop.ClearInterval(interval)
        loop.Stop()
        fmt.Println(runtime.NumGoroutine())
    }

}

Expectation: the number of running goroutines doesn't really increase. Reality: the printed number keeps increasing...

I suspect, but I am not entirely sure, that this happens because in eventloop/eventloop.go, there might be a race that leaves interval goroutines waiting forever:

func (i *Interval) run(loop *EventLoop) {
L:
    for {
        select {
        case <-i.stopChan:
            i.ticker.Stop()
            break L
        case <-i.ticker.C:
            loop.jobChan <- func() {
                loop.doInterval(i)
            }
        }
    }
}

Consider the case where both stopChan and ticker.C have a message, the select will choose "randomly" between them and it might choose the second case. I think, but am not entirely sure because I haven't fully analyzed the code, that the send into jobChan might block when the loop is stopped. There's also the possibility that the loop is stopped precisely after entering the ticker.C case and before jobChan is fed, in any case, in my production service I have identified lingering goroutines waiting to send into jobChan past the moment all event loops are stopped. Because the interval callback has access to the goja runtime, this effectively means that together with the leaked goroutine, the runtime and all of its globals will live in memory forever...

dop251 commented 1 month ago

The root cause here is that there is no mechanism to "dispose" the loop. When you stop it, it is expected that it would be restarted at some point and will be running until it finishes all its tasks. Implementing it would require some considerable refactoring, like the loop would need to keep references to all intervals and times so that it could clear them all and clearing intervals would need to be done in a different way to avoid the situation you've described.

Note, even if it's all done it wouldn't help the issue from #80, there you need to keep track of the channel and close it after stopping the loop.

gbl08ma commented 1 month ago

I am hopeful that the refactoring is not that complicated. I very quickly put together an experiment in this branch where the loop can now be instantiated with an optional context and it will automatically stop and dispose of all intervals/timeouts when the context is cancelled. I still need to write more tests and also make it so that a loop can't be started again once its context was cancelled (because that would leave the intervals/timeouts previously added in the loop in a broken state. My idea is that a loop should never live longer than its context; if the user wants to be able to pause and resume the loop, then they should never cancel its context until they're done using the loop and its runtime), but so far this is promising as the following example no longer leaks goroutines nor goja runtimes:

for {
    ctx, cancel := context.WithCancel(context.Background())
    loop := eventloop.NewEventLoop(eventloop.WithContext(ctx))
    loop.Start()
    interval := loop.SetInterval(func(vm *goja.Runtime) {}, 10*time.Millisecond)
    time.Sleep(500 * time.Millisecond)
    loop.ClearInterval(interval) // not strictly necessary
    loop.Stop() // not strictly necessary
    cancel()
    fmt.Println(runtime.NumGoroutine())
}
dop251 commented 1 month ago

How about this?

gbl08ma commented 1 month ago

That looks better, having the advantage of waiting for job completion on termination - while I don't really need this (in my case, if the loop takes over 10 seconds to Stop(), I interrupt the VM anyway, typically causing any jobs left to run to just complete as soon as they're started, assuming they use the runtime), it's definitely cleaner than my solution where the goroutines could linger for a bit longer after the context was cancelled.