crystal-lang / crystal

The Crystal Programming Language
https://crystal-lang.org
Apache License 2.0
19.35k stars 1.62k forks source link

GC doesn't seem aggressive enough at times #3997

Closed rdp closed 5 years ago

rdp commented 7 years ago

See if this one seems to use up an inordinate amount of RAM for me (it seems to for me on all OS's I tried anyway):

class Foo
  def finalize
    # Invoked when Foo is garbage-collected
    a = 3
    "bye #{3+4}"
  end

end

# Prints "Bye bye ...!" for ever
loop do
   Foo.new
end

I wonder if it's related to boehm gc finalization cycle or something. Let me know if reproducible. Thank you.

rdp commented 7 years ago

I could see maybe some method to "never memoize" that people could call if they want to keep RAM down and are OK with the cpu tradeoff...or would it still spike to 60MB then return down even with something like that?

On Wed, Mar 22, 2017 at 10:35 AM, Cris Ward notifications@github.com wrote:

What about adding the patched method above. Perhaps call it CallStack.drop_exception_cache. Then document what it does so app developers can decide if they want to use it?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/crystal-lang/crystal/issues/3997#issuecomment-288458895, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAw0P9kqkVXDqPhP6xl3wnDr8BgXs6Mks5roU2-gaJpZM4L21j8 .

ozra commented 7 years ago

@ysbaddaden - attacking a swarm of micro apps expected to stay at MB's each, running on a server, and spiking them all via requests causing OOM killings and what not, could also be considered an attack vector though.

@crisward , as @ysbaddaden says: --no-debug flag! - If that's hard to flip in your deployment, then I think you're using the wrong distro/packager/framework/whatever setup!

ysbaddaden commented 7 years ago

GC and HEAP memory is only one side of the memory usage. What about the process memory? What about the stacks (each Fiber consumes 8MB for it's stack)?

That being said, I don't understand how so much memory can be used; a hello world program only allocates less than 6KB of memory. Maybe Debug::DWARF::LineNumbers retains some references that can't be freed (e.g. @io).

crisward commented 7 years ago

@ozra I use https://github.com/crystal-lang/heroku-buildpack-crystal (I have a fork with the updated Libgc and I've added --no-debug for now, but actually find stack traces useful)

@ysbaddaden I agree, I'll have a read through the code, if I uncover anything I'll report back.

crisward commented 7 years ago

I've created the same app, and built for release to run on a mac and the memory issue doesn't happen with stack traces. So whatever is causing this seems platform specific. I know dwarf has a variety of formats, and their are few platform checks within the Callstack code, so either of these play a part. If the code was holding references to io, it's either in one of the platform branches or isn't the culprit.

ozra commented 7 years ago

On a side note: @ysbaddaden - The stacks for fibers are mmap'ed, and the kernel maps those pages (which are 4KB for small ones on most systems) to physical mem on page-faults - so most fibers should probably just use 8KB for stack actual reserved physical memory (if I remember correctly, there's a page set off as "protection zone" in the fiber stack alloc). Now, I might have missed something completely—or forgotten—about the alloc-code...

benoist commented 7 years ago
lib LibGC
  struct ProfStats
    heap_size : Word
    free_bytes : Word
    unmapped_bytes : Word
    bytes_since_gc : Word
    bytes_before_gc : Word
    non_gc_bytes : Word
    gc_no : Word
    markers_m1 : Word
    bytes_reclaimed_since_gc : Word
    reclaimed_bytes_before_gc : Word
  end

  fun get_prof_stats = GC_get_prof_stats(stats : ProfStats*, size : SizeT)
end

def gc_stats
  stats = Pointer(LibGC::ProfStats).malloc
  LibGC.get_prof_stats(stats, sizeof(LibGC::ProfStats))
  stats.value
end

I've recently started using the code above to get a bit more info about the GC, for example the amount of times it runs.

image

rdp commented 5 years ago

My initial example seems quite stable around 2MB RAM used now with 0.30.1 thanks everyone! for followers, to adjust how often it collect apparently you use the GC_free_space_divisor method and possibly some others mentioned near there, in case anybody ever wants to go down that route...Also apparently for programs that use a "lot of heap" there's a generational mode GC_enable_incremental that might be nice to add a method for sometime, it's said to be more responsive anyway. (descriptions https://github.com/ivmai/bdwgc/ ) Cheers!

rdp commented 4 years ago

Interestingly in my app on production, it tends to use 150MB of RAM, but if I have a GC loop:

spawn do
  loop do
    sleep 1.0
    GC.collect
  end
end

it stays around 40MB. This isn't as bad as it once was but might still be worth investigation at some point, at least to see if the API's mentioned earlier can keep the RAM down or not [and does it affect performance or not]. Leave a comment if anybody would like this reopened, if not I might get to it "some decade or so" :)

jhass commented 4 years ago

This probably has been mentioned before in this thread, I didn't reread, but generally this is working as intended for most GCs.

Allocating memory from the OS is costly, so many GCs employ heuristic on whether to release memory back to the system or not. And the simplest metric for a heuristic like that is the number of GC cycles that the space stayed free, hence your loop doing what it does. So generally most long running work loads have spiky memory usage, think a server application that processes and answers requests, then throws the state for that away, or a background processing daemon which is basically the same thing. Here there's clear performance advantages to avoiding the roundtrip to the OS memory allocator each time a request comes in. Then for short lived programs it really doesn't matter at being good at releasing memory ASAP, as the program is gonna terminate quickly anyways. So it's better to optimize the heuristic for the first case.

rdp commented 4 years ago

Yeah those are good points. I need to think more about this. A couple more random ideas: perform a gc "on idle" (when the scheduler detects idle). Maybe make a flag "don't cache the dwarf line numbers ever" [if exceptions are rare enough and you can handle the RAM usage when you do need an exception, or maybe even an option "don't add line information to exceptions ever" to avoid the RAM hit use entirely [if you can't afford the RAM hit at all, or "re-lookup PC locations one by one, so it never builds the large dwarf structure" ]. Just some thoughts... :)

rdp commented 4 years ago

For followers, you can also control some GC functions using environment variables: https://github.com/ivmai/bdwgc/blob/master/doc/README.environment

rdp commented 2 years ago

You can avoid the DWARF lines at runtime using CRYSTAL_LOAD_DWARF too FWIW. I think it would be nice if crystal someday came out with a "try to save RAM" compile time parameter :)

rdp commented 2 years ago

setting GC_FREE_SPACE_DIVISOR=200 didn't seem to help, FWIW...still seemed to just increase monotonically...still not sure what is going on exactly.

rdp commented 1 year ago

I did notice there are instructions here for trying to figure out why, when the heap grows out of control. Is it because of finalizers? Is BDW just not collecting often enough when it maxes out?

Hadeweka commented 10 months ago

I'm not 100% sure if this is related, but it seems to be a similar issue (at least on Windows):

https://forum.crystal-lang.org/t/memory-leak-on-windows/6197

EDIT: Wording