RADAR-Docker allows VM to run out of memory and hangs VM

RADAR-base / RADAR-Docker

Integrated Docker Stack for the RADAR mHealth Streaming Platform Components

https://hub.docker.com/u/radarbase/dashboard/

Apache License 2.0

16 stars 16 forks source link

RADAR-Docker allows VM to run out of memory and hangs VM #215

Open rocketsciencenerd opened 4 years ago

rocketsciencenerd commented 4 years ago

I have a server running the latest radar-docker that hangs periodically with the latest update to managementportal version 0.5.8 and radar-output:0.6.0. The issue is because the VM runs out of memory and gets hung, I restart the VM everything is running but then it runs out of memory again, and then I have to restart the VM again...... This has been verified by the logs in the /var/log/kern.log file below:

My docker container setup is below:

Screen Shot 2020-02-17 at 8 47 49 AM

My vm specs match the recommended specs from https://radar-base.org/index.php/documentation/introduction/: 4-core CPU 16 GB memory An SSD for the operating system and docker (at least 50 GB) 1 x 1 TB spinning disks for redundancy

Per @nivemaham's recommendation I am going to try changing this line: https://github.com/RADAR-base/RADAR-Docker/blob/master/dcompose-stack/radar-cp-hadoop-stack/docker-compose.yml#L827

to RADAR_HDFS_RESTRUCTURE_OPTS: -Xms250m -Xmx2g

Hope this helps others - may also be worth looking into on the master branch.

nivemaham commented 4 years ago

Thanks for reporting this issue @rocketsciencenerd . I think it could be related to running the radar-output as part of the stack and the specifications you were created based on an older version of RADAR-Docker where we run the radar-output as systemctl service with an interval. The current configuration on radar-output

RADAR_HDFS_RESTRUCTURE_OPTS: -Xms250m -Xmx4g

seems to consume (not necessarily continuously) up to 4G just for this container which may have caused OOM issue on the VM since the rest of the platform also requires some memory.

Which is why lowering the -Xmx to 2g seem to be a solution for me. Would it be possible for your system admin to check which container caused the OOM?

yatharthranjan commented 4 years ago

Hi, nivethika's suggestion sounds good. Not sure if you already know this, but you can use docker stats to see how much resources each container is consuming.

rocketsciencenerd commented 4 years ago

@yatharthranjan Is there a way to get a history of docker container usage? Looks like the stats command displays current usage.

yatharthranjan commented 4 years ago

Hi, i am not aware of a straight forward way to do that (I use netdata for monitoring the cgroups and the VM as a whole. It can be deployed in docker itself). But there are other tools that you can use for this. A quick google search should reveal some.

afolarin commented 4 years ago

CAdvisor is also relatively easy to setup (you can also use Prometheus) https://github.com/google/cadvisor https://prometheus.io/docs/guides/cadvisor/ but there are a bunch of other options too

blootsvoets commented 4 years ago

Since recently, we've had to increase our memory requirements to 24 GB on the base system. I think it would be wise to make that the base requirement. To avoid OOM but keep running with degraded performance, you can consider to enable swap (see e.g. https://www.digitalocean.com/community/tutorials/how-to-add-swap-space-on-ubuntu-18-04)

iDmple commented 3 years ago

Just to let you know that I have a similar issue on the same machine configuration (16GB RAM), some containers are hanging (kafka and HDFS). I resized to 32GB to test if it solves the issue.