brettwooldridge / NuProcess

Low-overhead, non-blocking I/O, external Process implementation for Java
Apache License 2.0
710 stars 84 forks source link

Native allocation free steady state on Linux #150

Closed avrecko closed 5 months ago

avrecko commented 5 months ago

While ~200KB mallocs per NuProcess on Linux might not sound as much. But a long running server process can create GBs of malloc calls that are first unnecessary and second might not be freeed for a long time causing OOM issues with Linux.

In my view it is bad practice to wait for Reference Handler to free native memory. A JVM might take hours before calling reference processor on heaps that don't have much GC activity.

JNA will only free native memory via the reference processor unless in JNA 5.12 they added API to explicitly free native memory via Memory#close call.

I've moved Out/Err buffer from being per NuProcess to be per PosixProcessor. This way there is only ~200 KB overhead per processor and not per process. Now this can be called a low-overhead library.

Stdin handling is left unchanged with the exception of allocation on-demand of the 65K buffer per NuProcess. It gets explicitly freed on exit.

I need to test this some more and will possibly update the code some more. Feel free to review, add suggestions.

Sorry about Windows. Don't have Windows so not sure about it. Maybe we can just add the clear on buffer since the following is now true:

The only downside is an API breaking change. For handler#onStdout and handler#onStderr the buffer is cleared before each call. Which isn't a big deal or a problem that can't be fixed by an update to application code.

I think this sure beats getting OOM issues in production from native memory not being released by a server that is spawning 100s of processes per second. You are in GB/s per minute range soon if you do a lot of spawing.

bturner commented 5 months ago

Thanks for the idea and your interest in improving NuProcess, @avrecko, but I won’t be able to accept this approach.

Using per-handler buffers for stdout and stderr requires that the handler fully consume both buffers on any callback. As you’ve already noted, that’s a semantics-breaking change. While code could be updated to account for it, it’s not viable to be included in any release of the library other than a major version bump to help draw developers’ attention to the change (and even then, many will likely not notice).

More importantly, though, I’m curious what you think is going to happen in cases where a handler isn’t really equipped to fully consume the buffer. They’ll allocate a secondary buffer, naturally, and dump the data there. (I’ve written several such handlers that would be required to do their own buffering with this change. Any handler that does UTF8 decoding, for example, would likely need such handling to deal with split multi byte characters.) That means for many consumers of the library there’s little to no net benefit on memory usage, and the library is more complex to use.

The allocations don’t seem like the real crux of your issue, to me. Rather, the delays in that memory being released, after a process terminates, seem like a more reasonable thing to improve—and something that can be improved without requiring any breaking changes to the API.

I’m sure this won’t be the response you’d like to see, but I don’t believe this is the correct direction for the library.

avrecko commented 5 months ago

@bturner Thanks for the fast reply and offering the rationale behind your decision. I have no issues accepting your rationale behind your decision. I can see your point.

I'll probably use my own fork that is malloc free in steady state. All of my handlers fully consume the buffer already.