aws / elastic-beanstalk-roadmap

AWS Elastic Beanstalk roadmap
https://aws.amazon.com/elasticbeanstalk/
Creative Commons Attribution Share Alike 4.0 International
283 stars 11 forks source link

If swap or zram is configured, include it in memory warning #293

Open Xarno opened 1 year ago

Xarno commented 1 year ago

Community Note

Tell us about your request Currently if I setup swap space it is not taken into account when reporting high memory usage. The status of the Instance goes to:

But in reality it still has 50% space in Swap, like so:

Bildschirmfoto 2023-07-14 um 09 53 55

If the system has swap configured the memory warning should use that too. Also the Status Level should not be Degraded but Info at most. If Memory and Swap is full then Status Level should be Degraded.

Supporting argument: If the system / swap activities cause the disk IO pool to be depleted there is already another warning in place. So you would see when the system is actually Degraded.

Is this request specific to an Elastic Beanstalk platform? No

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard? I want to use the instance to it's fullest. So I give my app a huge portion of the ram (3,3GB out of 4GB in this case). That leaves the host os with ~700MB. And to not get bitten by the kernel OOM Killer I set up a Swap Partition to let the kernel do it's memory allocation stuff.

Are you currently working around this issue? No Workaround known.

/opt/elasticbeanstalk/lib/ruby/lib/ruby/gems/2.6.0/gems/healthd-sysstat-1.0.3-universal-linux/lib/healthd-sysstat/plugin.rb Already reports swap space to the elastic beanstalk service but I could not get warning about full swap, even when tried with https://unix.stackexchange.com/a/254976/525725

Xarno commented 1 year ago

I found a workaround when I add the swap size to the the memory size before sending it to the elastic beanstalk service.

/opt/elasticbeanstalk/lib/ruby/lib/ruby/gems/2.6.0/gems/healthd-sysstat-1.0.3-universal-linux/lib/healthd-sysstat/plugin.rb -> see "Change Start"


require 'healthd/daemon/plugins/fixed_interval_base'
require 'healthd/daemon/logger'
require 'executor'

module Healthd
    module Plugins
        module Sysstat
            class Plugin < Daemon::Plugins::FixedIntervalBase
                namespace 'system'

                include Executor

                @@loadavg_path = '/proc/loadavg'
                @@stat_path = '/proc/stat'
                @@meminfo_path = '/proc/meminfo'
                @@cpuinfo_path = '/proc/cpuinfo'
                @@diskspace_refresh_after = 300
                @@cpuinfo_regexp = /^processor\s*:/
                @@meminfo_regexp = /([\w\(\)]+):\s+([0-9]+)/
                @@meminfo_keys = {
                    'MemTotal'     => 'mem_total',
                    'MemAvailable' => 'mem_available',
                    'MemFree'      => 'mem_free',
                    'Buffers'      => 'buffers',
                    'Cached'       => 'cached',
                    'SwapCached'   => 'swap_cached',
                    'SwapTotal'    => 'swap_total',
                    'SwapFree'     => 'swap_free'
                }
                @@pid_name_regexp = /.*\/(.*)\.pid$/

                def setup
                    # initialize cpu_usage
                    cpu_usage
                end

                def snapshot
                    data = {}
                    data = loadavg data
                    data = cpu_usage data
                    data = disk_space data
                    data = meminfo data
                    data = processor_count data
                    data = pids data
                    data
                end

                private
                def loadavg(data={})
                    h = {}
                    h['1'],
                    h['5'],
                    h['15'] = File.read(@@loadavg_path).split.first(3).collect { |i| i.to_f.round 2 }

                    data['loadavg'] = h
                    data
                end

                private
                def cpu_usage(data={})
                    h = {}
                    h['user'],
                    h['nice'],
                    h['system'],
                    h['idle'],
                    h['iowait'],
                    h['irq'],
                    h['softirq'] = File.read(@@stat_path).each_line.first.split.drop(1).collect(&:to_i)

                    delta = h.merge @cpu_usage do |key, current, previous|
                        current - previous
                    end if @cpu_usage
                    @cpu_usage = h

                    data['cpu_usage'] = delta
                    data
                end

                private
                def disk_space(data={})
                    @diskspace_at ||= Time.at 0
                    @diskspace ||= nil

                    if Time.now - @diskspace_at > @@diskspace_refresh_after
                        if stats = fs_stats
                            @diskspace_at = Time.now
                            @diskspace = stats
                        end
                    end

                    raise "diskspace statistics not available" unless @diskspace

                    data['disk_space'] = { '/' => @diskspace }
                    data
                end

                private
                def fs_stats
                    output = sh %[stat --file-system --format "%s %b %a" /]
                    h = {}
                    h['block_size'],
                    h['block_count'],
                    h['free_blocks'] = output.split.collect(&:to_i)

                    if h.values.count(&:itself) != 3
                        Daemon::Logger.warn "invalid filesystem statistics. output: #{output}"
                        nil
                    else
                        h
                    end
                rescue Executor::NonZeroExitStatus => e
                    Daemon::Logger.warn "could not fetch filesystem statistics. exit status: #{e.exit_code}, message: #{e.message}"
                    nil
                end

                private
                def meminfo(data={})
                    raw = File.read(@@meminfo_path)
                    h = raw.each_line.first(20).inject({}) do |h, line|
                        _, key, value = line.match(@@meminfo_regexp).to_a
                        value = value.to_i

                        h[@@meminfo_keys[key]] = value if @@meminfo_keys.include? key

                        h
                    end
                    # << Change Start >> 
                    #Trick Elastic Beanstalk to count swap as mem
                    h['mem_total'] = h['mem_total'] + h['swap_total']
                    h['mem_available'] = h['mem_available'] + h['swap_free']
                    h['mem_free'] = h['mem_free'] + h['swap_free']
                    # << Change End >>
                    data['meminfo'] = h
                    data
                end

                private
                def processor_count(data={})
                    @processor_count ||= begin
                        cpuinfo = File.read(@@cpuinfo_path) rescue nil
                        count = cpuinfo.scan(@@cpuinfo_regexp).count
                        count if count > 0
                    end

                    data['processor_count'] = @processor_count if @processor_count
                    data
                end

                private
                def pids(data={})
                    @pid_name_cache ||= {}

                    h = Dir.glob("#{options.beanstalk_base_path}/*.pid").inject({}) do |h, path|
                        name = @pid_name_cache[path]

                        unless name
                            name = path[@@pid_name_regexp, 1]
                            @pid_name_cache[path] = name
                        end

                        h[name] = running? path
                        h
                    end

                    data['service_status'] = h
                    data
                end

                private
                def running?(path)
                    pid = if File.exists? path
                        contents = File.read(path)
                        return false if contents.empty?
                        contents.to_i
                    end

                    case
                    when pid && ( Process.getpgid pid rescue nil )
                        true
                    when pid
                        false
                    else
                        nil
                    end
                end
            end
        end
    end
end```
Xarno commented 7 months ago

Its the same problem for the new ZRAM feature: https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.2.20230920.html

I get a 97% Memory in Use warning when actually I have nearly 50% space in the ZRAM Swap.

top - 15:55:08 up 15:46,  0 users,  load average: 0.16, 0.29, 0.33
Tasks: 146 total,   2 running, 144 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.7 us,  0.2 sy,  0.0 ni, 99.0 id,  0.2 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :   1847.3 total,     57.3 free,   1672.8 used,    117.3 buff/cache
MiB Swap:   1847.0 total,    869.2 free,    977.8 used.     33.7 avail Mem

sh-5.2$ lsblk
NAME          MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS
zram0         252:0    0  1.8G  0 disk [SWAP]
nvme0n1       259:0    0    8G  0 disk
├─nvme0n1p1   259:1    0    8G  0 part /
└─nvme0n1p128 259:2    0   10M  0 part /boot/efi
sh-5.2$ swapon
NAME       TYPE      SIZE   USED PRIO
/dev/zram0 partition 1.8G 987.2M  100
sh-5.2$ zramctl
NAME       ALGORITHM DISKSIZE   DATA  COMPR  TOTAL STREAMS MOUNTPOINT
/dev/zram0 lzo-rle       1.8G 949.9M 439.6M 458.3M       2 [SWAP]