Open jkim-ru opened 6 years ago
same issue as: https://github.com/dibbs-vdc/ccql/issues/8
Notes from Chuck:
The oom killer triggered from the daily cron job. It killed ruby. I’ve created /usr/swap, a 4 GB swap file, and added it to swap. That will give you enough extra that you’ll probably survive. My practice is always to have swap, typically about the same size as memory, because of things like this.
I think we may have a memory leak.
Here’s what free says:
total used free shared buff/cache available
Mem: 8045240 3775368 142124 99436 4127748 3887840
Swap: 4194300 0 4194300
Plenty of space, you’d think. When the oom killler runs, it puts a “ps” into /var/log/messages. The process it killed was a ruby process, with size 1430950. There’s a factor of 4 difference between that and what shows on “ps aux.” Currently there are no ruby processes with anywhere near that memory usage. The largest is puma (which I believe shows as ruby in /var/log/messages). But remember the factor of 4 difference. The listing in messages is much larger.
I’m going to start a background job that does ps aux to a file, to see if we can see something growing. If we have a multi-GB memory leak, no expansion of memory or swap will prevent problems. As a workaround, we could restart whatever it is nightly.
Potential tools to try:
Dev machine crashed due to potential memory leak (AM 10/3/2017). Need to investigate ways to profile code for leaks.