Closed rdp closed 5 years ago
I could see maybe some method to "never memoize" that people could call if they want to keep RAM down and are OK with the cpu tradeoff...or would it still spike to 60MB then return down even with something like that?
On Wed, Mar 22, 2017 at 10:35 AM, Cris Ward notifications@github.com wrote:
What about adding the patched method above. Perhaps call it CallStack.drop_exception_cache. Then document what it does so app developers can decide if they want to use it?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/crystal-lang/crystal/issues/3997#issuecomment-288458895, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAw0P9kqkVXDqPhP6xl3wnDr8BgXs6Mks5roU2-gaJpZM4L21j8 .
@ysbaddaden - attacking a swarm of micro apps expected to stay at MB's each, running on a server, and spiking them all via requests causing OOM killings and what not, could also be considered an attack vector though.
@crisward , as @ysbaddaden says: --no-debug
flag! - If that's hard to flip in your deployment, then I think you're using the wrong distro/packager/framework/whatever setup!
GC and HEAP memory is only one side of the memory usage. What about the process memory? What about the stacks (each Fiber consumes 8MB for it's stack)?
That being said, I don't understand how so much memory can be used; a hello world program only allocates less than 6KB of memory. Maybe Debug::DWARF::LineNumbers
retains some references that can't be freed (e.g. @io
).
@ozra I use https://github.com/crystal-lang/heroku-buildpack-crystal (I have a fork with the updated Libgc and I've added --no-debug for now, but actually find stack traces useful)
@ysbaddaden I agree, I'll have a read through the code, if I uncover anything I'll report back.
I've created the same app, and built for release to run on a mac and the memory issue doesn't happen with stack traces. So whatever is causing this seems platform specific. I know dwarf has a variety of formats, and their are few platform checks within the Callstack code, so either of these play a part. If the code was holding references to io, it's either in one of the platform branches or isn't the culprit.
On a side note: @ysbaddaden - The stacks for fibers are mmap'ed, and the kernel maps those pages (which are 4KB for small ones on most systems) to physical mem on page-faults - so most fibers should probably just use 8KB for stack actual reserved physical memory (if I remember correctly, there's a page set off as "protection zone" in the fiber stack alloc). Now, I might have missed something completely—or forgotten—about the alloc-code...
lib LibGC
struct ProfStats
heap_size : Word
free_bytes : Word
unmapped_bytes : Word
bytes_since_gc : Word
bytes_before_gc : Word
non_gc_bytes : Word
gc_no : Word
markers_m1 : Word
bytes_reclaimed_since_gc : Word
reclaimed_bytes_before_gc : Word
end
fun get_prof_stats = GC_get_prof_stats(stats : ProfStats*, size : SizeT)
end
def gc_stats
stats = Pointer(LibGC::ProfStats).malloc
LibGC.get_prof_stats(stats, sizeof(LibGC::ProfStats))
stats.value
end
I've recently started using the code above to get a bit more info about the GC, for example the amount of times it runs.
My initial example seems quite stable around 2MB RAM used now with 0.30.1 thanks everyone! for followers, to adjust how often it collect apparently you use the GC_free_space_divisor method and possibly some others mentioned near there, in case anybody ever wants to go down that route...Also apparently for programs that use a "lot of heap" there's a generational mode GC_enable_incremental that might be nice to add a method for sometime, it's said to be more responsive anyway. (descriptions https://github.com/ivmai/bdwgc/ ) Cheers!
Interestingly in my app on production, it tends to use 150MB of RAM, but if I have a GC loop:
spawn do
loop do
sleep 1.0
GC.collect
end
end
it stays around 40MB. This isn't as bad as it once was but might still be worth investigation at some point, at least to see if the API's mentioned earlier can keep the RAM down or not [and does it affect performance or not]. Leave a comment if anybody would like this reopened, if not I might get to it "some decade or so" :)
This probably has been mentioned before in this thread, I didn't reread, but generally this is working as intended for most GCs.
Allocating memory from the OS is costly, so many GCs employ heuristic on whether to release memory back to the system or not. And the simplest metric for a heuristic like that is the number of GC cycles that the space stayed free, hence your loop doing what it does. So generally most long running work loads have spiky memory usage, think a server application that processes and answers requests, then throws the state for that away, or a background processing daemon which is basically the same thing. Here there's clear performance advantages to avoiding the roundtrip to the OS memory allocator each time a request comes in. Then for short lived programs it really doesn't matter at being good at releasing memory ASAP, as the program is gonna terminate quickly anyways. So it's better to optimize the heuristic for the first case.
Yeah those are good points. I need to think more about this. A couple more random ideas: perform a gc "on idle" (when the scheduler detects idle). Maybe make a flag "don't cache the dwarf line numbers ever" [if exceptions are rare enough and you can handle the RAM usage when you do need an exception, or maybe even an option "don't add line information to exceptions ever" to avoid the RAM hit use entirely [if you can't afford the RAM hit at all, or "re-lookup PC locations one by one, so it never builds the large dwarf structure" ]. Just some thoughts... :)
For followers, you can also control some GC functions using environment variables: https://github.com/ivmai/bdwgc/blob/master/doc/README.environment
You can avoid the DWARF lines at runtime using CRYSTAL_LOAD_DWARF too FWIW. I think it would be nice if crystal someday came out with a "try to save RAM" compile time parameter :)
setting GC_FREE_SPACE_DIVISOR=200 didn't seem to help, FWIW...still seemed to just increase monotonically...still not sure what is going on exactly.
I did notice there are instructions here for trying to figure out why, when the heap grows out of control. Is it because of finalizers? Is BDW just not collecting often enough when it maxes out?
I'm not 100% sure if this is related, but it seems to be a similar issue (at least on Windows):
https://forum.crystal-lang.org/t/memory-leak-on-windows/6197
EDIT: Wording
See if this one seems to use up an inordinate amount of RAM for me (it seems to for me on all OS's I tried anyway):
I wonder if it's related to boehm gc finalization cycle or something. Let me know if reproducible. Thank you.