ecohealthalliance / infrastructure

Automation code related to jenkins and docker based infrastructure
Apache License 2.0
1 stars 2 forks source link

mitigate OOM errors on reservoir #53

Open RobJY opened 3 years ago

RobJY commented 3 years ago

Each of these should start with a manual change on the servers as a test before adding to the Ansible deploy scripts.

RobJY commented 3 years ago

Starting with increasing swap space.

The process is a little different on the two servers because aegypti is running Ubuntu 20.04 and prospero is running Ubuntu 18.04.

For prospero the existing swap space is on a partition and not in a file, so we'll have to remove the partition before adding the file. To remove the existing swap partition we'll do:

  1. Turn off the existing swap device: sudo swapoff /dev/nvme0n1p3
  2. comment out existing swap entry from /etc/fstab

Now adding or modifying the swapfile is largely the same:

  1. turn off all swap: sudo swapoff -a
  2. make/resize swap file: sudo dd if=/dev/zero of=/swapfile bs=1G count=50
  3. change swap file permissions: sudo chmod 600 /swapfile
  4. make the file usable as swap: sudo mkswap /swapfile
  5. activate the swap file: sudo swapon /swapfile
  6. add to /etc/fstab: /swapfile none swap sw 0 0
  7. Check: grep SwapTotal /proc/meminfo and swapon --show
RobJY commented 3 years ago

Done with manual update of swap space on aegypti and prospero. Change to /etc/fstab will cause the modification to persist, so nothing further needs to be done for swap space.

RobJY commented 3 years ago

I've manually run a script on aegypti and prospero that creates systemd files to limit user memory usage to 95% of RAM.