Closed ikaruswill closed 3 years ago
Is there any way to avoid locking up the file system?
When it becomes full, no. This is standard Linux behavior to make a FS read-only when it has no more space available. My suggestions to you would be to reduce the logging to the /var/log/container
directory that you mentioned in your post or by increasing the memory allocation to that directory.
which leaves a zram device (e.g. zram0)
This is a issue that I cannot solve, I wish I could but because of the nature of it I cannot.
The root of the issue is that zramctl
will NOT remove any zram device that is in use in any way shape or form and when you restart zram while your Kubernetes cluster is running it will not reset because it thinks that it is still in use. It has been very frustrating and I wish it was not how it behaved but I can't do anything about it.
Outside of increasing your memory allocation there is not much you can do as zstd
is about as good as it gets for zram compression right now.
I hope this makes sense, please let me know if you have any more questions.
I see. Thanks for your detailed answer Ethan. I guess there's no other way for now than to just increase the memory allocation.
Actually I wonder at times how the guys at Armbian managed to do it. I have a bunch of nodes running Armbian and they've survived on the default 50MB zstd for as long as I remember. I tried digging around but couldn't see why. They have standard log rotation schedules and all. Would appreciate some help with studying their implementation.
Also this may be standard linux behavior, but is it expected that the space used as displayed in zramctl
is not freed up upon deletion of some files? My guess is that this has something to do with OverlayFS right?
My guess is that this has something to do with OverlayFS right?
Without going to look and make sure, that sounds right. It is likely because of the way overlayFS handles file deletes.
Alright thanks for your inputs. I'll study the code by Armbian and perhaps find out what they're doing right, and maybe open a PR if I find anything.
Not familiar with what's going on with the feature here (I don't use it), but I did come across this regarding using a file system mount on zram instead of using zram for swap:
When files are removed, Zram doesn't remove the compressed pages on memory because it's not notified that the space is not used for data anymore. The discard option performs discard when a file is removed. If you use the discard mount option Zram will be notified about the unused pages and will resize accordingly.
Though I'm not sure if that's of any relevance with overlayfs involved?
I have seen mem_limit
mentioned in this project README and issues, if that's being applied with some default other than 0, then that could contribute to a lock up in my experience (only with zram and swap), whereas with no mem_limit
set, OOM triggered to keep the system responsive.
Use zram only for swap. file size in zramswap is dynamic.
Hi there, I've been following
zram-config
since the days when it was calledzram-swap-config
under the previous owner,StuartIanNaylor
and the work you guys put in here has really extended the life of my old Raspberry Pi 3Bs. I'm really thankful for the existence ofzram-config
.I've since ramped up my cluster into 6x Rock Pi 4As, a Rock Pi X and an NUC. Though memory is less of an issue now, but logging I/O is still a concern as I run on eMMC storage which is not as replaceable as SD cards.
On my devices that run
zram-config
, I run into this issue whereby the zram log volume becomes read-only when it's full. This is usually not an issue but I run Kubernetes on the cluster, and container logs write to/var/log/container
, so when it becomes full, no new containers can be created on that node. This forces me to either restartzram-config
service, which leaves a zram device (e.g.zram0
) in thezramctl
list while mounting the logs on a new zram device (e.g.zram2
), or restart the node itself which has proven to be quite disruptive to workloads.My question would be:
I understand this may not be a problem or bug with the code itself and that it may be expected behaviour but I'd like to learn of any mitigation measures that you'd recommend I take apart from simply increasing the memory allocation for the log volume or switching to a higher compression-ratio algorithm (I'm already on zstd). I'd be happy to contribute as well if some changes can help with this behavior.