crystal-lang / crystal

The Crystal Programming Language
https://crystal-lang.org
Apache License 2.0
19.41k stars 1.62k forks source link

HTTP Server – huge memory usage #8602

Closed DavidB137 closed 4 years ago

DavidB137 commented 4 years ago

Hi.

Recently I started coding my first big app in Crystal. I have an (REST) API and for every request I need to generate a lot of data. But after some testing I realized the GC probably isn't working in this case.

Suppose this program:

require "http/server"

server = HTTP::Server.new do |context|
    context.response.content_type = "text/plain"

    count = context.request.resource[1...]
    context.response.print "random string" * count.to_i
end

address = server.bind_tcp("0.0.0.0", 3000)
puts "Listening on http://#{address}"
server.listen

It generates random string 1024 times for requests to http://localhost:3000/1024, 5000 times for requests to http://localhost:3000/5000.

The problem is, when it generates a lot of data, like 10 000 000 times random string, it allocates the a lot RAM, but never brings it back, even though the response has been sent. When you make one request for 10 million (random strings) and then 10 requests for 1 thousand, memory usage never drops (it's steady 100s of MB). But when you make a 2+ subsequent requests for 10 million, RAM usage is in the same level as it would be if you only did 2 requests for 10 million. Try to play with requests to the server above and look at the memory usage.

I don't need to do anything with data from previous requests, so target is to have at most a couple of MB of RAM consumed between requests (when no data generation happens) and only that much RAM consumed by request, as is required (like 100M of RAM for 1 million, 1M of RAM for 10 thousand,...).

(Hope I described the problem clearly enough.)

My system: Crystal 0.32.1, Debian 10, 2 CPU cores, 4GB RAM

jkthorne commented 4 years ago

Is this a compaction issue? Also how are you measuring the ram usage? When optimizing memory usage I believe the core team uses to confirm optimizations. https://github.com/crystal-lang/crystal/blob/c68418eac7f31187b9bcc2436982fb3bf67d2f0b/src/benchmark.cr#L147

DavidB137 commented 4 years ago

I use HTOP, not internal testing tools.

straight-shoota commented 4 years ago

It's entirely possible garbage collection works correctly. This can be tested by looking at GC.stats. Heap size should shrink after memory heavy requests. But the GC doesn't immediately return the free memory to the operating system. AFAIK it should generally do that, but it can take some time. So I'm not sure what your time frame reference "never" refers to. Did you let it run for an hour or so and did not see any reduction?

Also see #3997 for some in-depth discussion about the GC behaviour.

kostya commented 4 years ago

I manually add to my servers every(5.seconds) { GC.collect }, and have no memory problems in production usage. I start do it some years ago when has some problems, not sure it needed now.

jkthorne commented 4 years ago

Should this be a configurable GC option?

straight-shoota commented 4 years ago

Please don't jump to any conclusions without being fully informed about the specific circumstances. There is no indication that garbage collection is running not frequently enough. GC.collect most importantly frees unused heap memory but this memory stills stays in possession of the process. Returning it back to the operating system is conceptually different behaviour. I'm not sure about the specifics (that means details of bdwgc). But there is some cool down period involved.

rdp commented 4 years ago

Is the amount of RAM used staying steady? It's possible the GC isn't running frequently enough...maybe? What if you add GC.collect for each request?

ysbaddaden commented 4 years ago

Also:

RX14 commented 4 years ago

This honestly just sounds like bdwgc not returning memory to the OS, which is no problem in app server usage.

@DavidB137 the memory shown in htop will be the maximum memory the program has ever used, this is expected and shouldn't be a problem, since you need to allocate enough memory for the maximum your server will use anyway. This is not the real "current memory usage" of the program, just the amount of memory the program has reserved from the OS to work in. It may be using only a little bit of that memory in practice.

straight-shoota commented 4 years ago

I've looked into the example and it seems one of the main reasons for not returning the memory to OS is that garbage collection runs very infrequently. IIUC, bdwgc runs GC.collect internally every now and then. When the server is idling, obviously there is no interaction with the GC, so it won't run. You can keep it active by issuing lots of requests. Alternatively, call GC.collect explicitly in a fixed interval as suggested in https://github.com/crystal-lang/crystal/issues/8602#issuecomment-568116197

When the GC runs, it collects the resources from the finished request and eventually (after a few minutes) also returns memory to the OS.

When an application idles most of the time, memory spikes stay around longer because GC runs infrequently and thus collecting unused memory is delayed and returning memory to the OS is delayed even further. In an application that regularly allocates memory and thus GC collect runs more frequently, this is not an issue.

Maybe we could consider having an maximum interval for GC.collect to run somewhat regularly. But I'm not sure that's a good idea in general because for busy applications it might add more stress to the GC. But this topic also touches internals of bdwgc that I'm not familiar with. A simple workaround is just to call GC.collect explicitly in a regular interval when your application is expected to not invoke GC.collect frequently enough.

DavidB137 commented 4 years ago

Yeah, okay, thank you all.

I've added GC.collect to my code inside of an interval as @kostya suggested (using Schedule.cr) and now everything looks good. Huge spikes aren't that much of a problem, because memory is available back to the OS within ~5 minutes.

But I have another question. GC.stats report this:

GC::Stats(@heap_size=192512, @free_bytes=12288, @unmapped_bytes=2765848576, @bytes_since_gc=48, @total_bytes=3988510656)

but there's 123 MB consumed by the program (as reported by HTOP). What is that extra 122 MB used for?

RX14 commented 4 years ago

@DavidB137 it's impossible to say without seeing the program. Fiber stacks, or reading virtual instead of resident memory are the most likely culprits.

DavidB137 commented 4 years ago

@RX14 It was my fault. I've edited the code and everything goes well. Sorry for the distraction.

rdp commented 2 years ago

I did find GC_UNMAP_THRESHOLD at https://github.com/ivmai/bdwgc/blob/8faa9fc4e9c635457c84d68b9e5ecf8210572082/doc/README.environment for explaining the cooldown, but not sure what is really going on yet...