gluster / project-infrastructure

Issues related to GlusterFs infrastructure components.
0 stars 0 forks source link

Not enough memory on nodes? #171

Open mykaul opened 2 years ago

mykaul commented 2 years ago

From https://build.gluster.org/job/gh_centos7-regression/2751/consoleFull (which may be a legit failure?):

ok  24 [     15/     16] <  45> 'umount /mnt/glusterfs/0'
ok  25 [     13/     78] <  46> 'ta_start_mount_process /mnt/glusterfs/0'
ok  26 [     14/    978] <  47> '1 ta_up_status patchy /mnt/glusterfs/0 0'
cat: /mnt/glusterfs/0/a.txt: Cannot allocate memory
not ok  27 [     13/     10] <  48> 'Hello cat /mnt/glusterfs/0/a.txt' -> 'Got "" instead of "Hello"'
losetup: /d/dev/loop*: detach failed: No such file or directory
Failed 1/27 subtests 

In the 2nd attempt, the test passed.

Do we have enough memory on our nodes?

mscherer commented 2 years ago

We have the same amount as before (2G), increasing would mean reinstalling. I see that the munin graphs are not working, so I would need to fix that first to see if anything is wrong.

But here, isn't the "cannot allocate memory" a generic error message ? I see no out of memory error on that builder.

mykaul commented 2 years ago

We have the same amount as before (2G), increasing would mean reinstalling. I see that the munin graphs are not working, so I would need to fix that first to see if anything is wrong.

But here, isn't the "cannot allocate memory" a generic error message ? I see no out of memory error on that builder.

I did not think it comes from Gluster, and 2G seems quite low to me for tests. I wonder if we can add some metrics to ensure we don't hit something from time to time. Is swap enabled on the hosts?

mscherer commented 2 years ago

I think since cat is trying to be run on a file that is backed by a FUSE filesystem, the "cannot allocate memory" message could be a symptom of a underlying error (like FUSE returning a error code where cat do not expect one, or something like that).

2G is low, but that was more than enough when we sized the builders back in the day, nothing required more than 1G to run on the tests, and we increased to 2G for dnf/yum/etc.

We have no swap on the builders for the moment.

I think we should first look at the graph (so I should fix), see what is needed , and then reinstall, add swap or fix once we isolate the problem.

mykaul commented 2 years ago

Is the graph fixed now?

mscherer commented 2 years ago

So they were fixed , and they broke again:

https://munin.gluster.org/munin/aws.gluster.org/builder-c7-3.aws.gluster.org/index.html

I need to fix again :/

mscherer commented 2 years ago

However, we have the graph for month: https://munin.gluster.org/munin/aws.gluster.org/builder-c7-3.aws.gluster.org/memory.html

And while the colors are a bit messy, we can see that the majority of the memory is used by the fs cache.