canonical / lxd

Powerful system container and virtual machine manager
https://canonical.com/lxd
GNU Affero General Public License v3.0
4.38k stars 931 forks source link

LXD resources does not report full system memory #7362

Closed ltrager closed 4 years ago

ltrager commented 4 years ago

Required information

Issue description

When getting the amount of system memory from LXD resources the full memory amount is not being reported. This causes MAAS to under report the amount of memory the system has. lshw reports the correct amount.

Steps to reproduce

  1. Create an LXD VM with 2048M of memory
  2. Install LXD in the newly created VM
  3. Memory amount reported by LXD will be 1991M.

Another example can be found on hardware. My laptop has 64G of RAM however LXD is reporting the total memory amount to be 62.47G.

stgraber commented 4 years ago

The API reports sizes in bytes, not in M, so could it be the conversion from bytes to MB or MiB that's wrong?

ltrager commented 4 years ago

Its not a conversion problem

# curl -G --unix-socket "/var/snap/lxd/common/lxd/unix.socket" "lxd/1.0/resources" 2>/dev/null | jq '.metadata.memory.total / 1024 / 1024 / 1024'
62.47153091430664
stgraber commented 4 years ago

Doesn't the byte value match /proc/meminfo's MemTotal?

stgraber commented 4 years ago

well, that one is in kibibytes actually, but still, doesn't MemTotal match what LXD shows you?

ltrager commented 4 years ago

It does, however MemTotal reports the amount the available system memory minus the amount reserved for the kernel. Previously MAAS gathered this information from lshw which reports the full amount.

# sudo lshw -json -c memory | jq '.[0].size / 1024 / 1024 / 1024'
64
stgraber commented 4 years ago

Ok, so looks like we should make MemTotal be.

/sys/devices/system/memory/block_size_bytes number of `/sys/devices/system/memory/memorywhereonlineis set to1`. If that fails for some reason, fallback to MemTotal.

@monstermunchkin can you implement this please?

ltrager commented 4 years ago

I'm not sure thats right. This is from my laptop with 64GB of RAM.

count=0
for i in /sys/devices/system/memory/memory*; do
    [ $(cat $i/online) = "1" ] && count=$((count+1))
done
memory_bytes=$(($(cat /sys/devices/system/memory/block_size_bytes) * $count))
echo "Memory bytes: $memory_bytes"
Memory bytes: 4088000000
echo "Memory GB: $(($memory_bytes / 1024 / 1024 / 1024))"
Memory GB: 3
stgraber commented 4 years ago

It works once you realize that block_size_bytes is in hex :)

ltrager commented 4 years ago

ah I missed that however the number is still off by a little bit

count=0
for i in /sys/devices/system/memory/memory*; do
    [ $(cat $i/online) = "1" ] && count=$((count+1))
done
memory_bytes=$((0x$(cat /sys/devices/system/memory/block_size_bytes) * $count))
memory_gb=$(echo "$memory_bytes / 1024^3" | bc -l)
echo "Memory bytes: $memory_bytes"
Memory bytes: 68585259008
echo "Memory GB: $memory_gb"
Memory GB: 63.87500000000000000000
ltrager commented 4 years ago

Its actually off by 1, starting the counter at 1 gets the correct amount.

count=1
for i in /sys/devices/system/memory/memory*; do
    [ $(cat $i/online) = "1" ] && count=$((count+1))
done
memory_bytes=$((0x$(cat /sys/devices/system/memory/block_size_bytes) * $count))
memory_gb=$(echo "$memory_bytes / 1024^3" | bc -l)
echo "Memory bytes: $memory_bytes"
Memory bytes: 68719476736
echo "Memory GB: $memory_gb"
Memory GB: 64.00000000000000000000
monstermunchkin commented 4 years ago

I have 16GB of RAM, and if I start with count=1, it reports Memory GB: 16.12500000000000000000. Starting with count=0, it reports Memory GB: 16.00000000000000000000. So, that's a bit weird.

stgraber commented 4 years ago

Works properly on servers:

root@athos:~# /home/stgraber/lshw -json -c memory | jq '.[-1].size / 1024 / 1024'
458752                      
root@athos:~# echo $(($(printf '%c' /sys/devices/system/memory/memory*/online | wc -c |     dc -e '16i' -f /sys/devices/system/memory/block_size_bytes -e 'Ai ? *p')/1024/1024))
458752

So I think the logic is correct, the issue with some systems is that some amount of physical memory may be diverted to other uses by firmware (onboard graphics) which then causes one or more chunks to be missing from usable memory.

I think it's correct to report what's exposed to the OS as that's what one would use to size their deployment. Similarly we don't list memory sticks or cpus that are offline.