coreos / fedora-coreos-tracker

Issue tracker for Fedora CoreOS
https://fedoraproject.org/coreos/
261 stars 59 forks source link

Revisit enabling swap on zram by default #859

Open dustymabe opened 3 years ago

dustymabe commented 3 years ago

We've recently been discussing swap-on-zram again (see systemd-oomd discussion).

In the past when we decided to wait we also said we would revisit.

This would probably consist of including the zram-generator-defaults package and then documenting how to opt-out.

dustymabe commented 3 years ago

Also worth noting that Kubernetes has accepted the proposal to support swap in 1.22: https://github.com/kubernetes/kubernetes/issues/53533

lucab commented 3 years ago

The zram-generator-defaults RPM is effectively just a /usr/lib/systemd/zram-generator.conf file, and at least we could make sure the content of https://docs.fedoraproject.org/en-US/fedora-coreos/sysconfig-configure-swaponzram/ is aligned with that.

jlebon commented 3 years ago

We discussed this in the meeting today. No concrete outcome. This seems very similar to https://github.com/coreos/fedora-coreos-tracker/issues/840.

ISTM like either we optimize for k8s and leave things disabled, or we optimize for the single node case and require k8s distros to disable things. Either way someone will have to do some work.

My personal opinion (without digging into the specific cases here) is: we don't ship with k8s, we ship standalone and should be ready to use as is. And in that capacity, there's an expectation that we feel and act like traditional Fedora (obviously apart of what makes FCOS FCOS). Setting up k8s already requires extra steps to set up the host, so this isn't really a heavy burden to carry (but I agree we should otherwise try to keep that burden to a minimum when it doesn't conflict with the single node case).

Edit: or maybe a simpler way to say this is just: we default to the single node case, but we're easily configurable for k8s.

dustymabe commented 3 years ago

We discussed the cross section of this with the oomd change in the community meeting today.

@jdoss is working on doing some testing and we'll hear back from him next week.

We did make a small decision:

  * AGREED: since oomd works better with swap, let's tie the swaponzram
    proposal and the oomd proposals together. If we do one, we do the
    other.  (dustymabe, 16:50:58)

but we also decided to take a step back and discuss single node versus kubernetes defaults briefly first: https://github.com/coreos/fedora-coreos-tracker/issues/880

jdoss commented 3 years ago

@jdoss is working on doing some testing and we'll hear back from him next week.

My coworker and I did the testing and posted it in the other issue: https://github.com/coreos/fedora-coreos-tracker/issues/840#issuecomment-867813496

travier commented 3 years ago

https://kubernetes.io/blog/2021/08/09/run-nodes-with-swap-alpha/ > Alpha support in K8s 1.22

travier commented 10 months ago

https://kubernetes.io/blog/2023/08/24/swap-linux-beta/ > Beta support in 1.28

jlebon commented 5 months ago

AFAICT, it's still in Beta in 1.29.

It'd be great to close this gap. We should dig into what the failure mode is nowadays when bringing up a cluster on top of nodes with swap (and the feature disabled). (E.g. is it a hard error? a warning?)

prestist commented 5 months ago

We talked about this in the meeting today, @jlebon's comment. Additonally ACTION: fifofonix to bring up a 1.28 cluster with zswap (@spresti:fedora.im, 17:35:26)

kannon92 commented 5 months ago

hello, I've been working on swap in kube for 1.30 with itamar. Please bring up any findings you have.

I did a test yesterday where I brought up a kubernetes dev cluster (local-up-cluster) with crio and LimitedSwap.

In 1.30, we are turning the feature on by default but disabling swap usage (NoSwap). We did not recommend zram at the moment but that was mostly due to lack of support for memory based swap on most OSes.

Some things we do wonder about is if we use zwap, do we have to worry about setting any cgroups differently from memory.swap.max?

kannon92 commented 5 months ago

AFAICT, it's still in Beta in 1.29.

It'd be great to close this gap. We should dig into what the failure mode is nowadays when bringing up a cluster on top of nodes with swap (and the feature disabled). (E.g. is it a hard error? a warning?)

In kubernetes (kubelet) there is a config called --fail-swap-on. If swap is detected and this field is set to true, kubelet will fail to start.

We aren't changing this behavior in Kubernetes. So for swap enabled node, we would recommend setting --fail-swap-on=false.

Feature is enabled in 1.30 by default but we set a configuration to disallow pods to utilize swap.

fifofonix commented 4 months ago

Confirmed, as @kannon92 explained above that on k8s 1.28.3 kubelet exits when it detects zram. Error message below. I think the main ask was to understand this behaviour. I haven't tried to enable --fail-swap-on at this point.

E0325 22:53:14.934951   10211 run.go:74] "command failed" err="failed to run Kubelet: running with swap on is not supported, please disable swap! or set --fail-swap-on flag to false. /proc/swaps contained: [Filename\t\t\t\tType\t\tSize\t\tUsed\t\tPriority /dev/zram0                              partition\t4194300\t\t0\t\t100]"
iholder101 commented 4 months ago

Confirmed, as @kannon92 explained above that on k8s 1.28.3 kubelet exits when it detects zram. Error message below. I think the main ask was to understand this behaviour. I haven't tried to enable --fail-swap-on at this point.

E0325 22:53:14.934951   10211 run.go:74] "command failed" err="failed to run Kubelet: running with swap on is not supported, please disable swap! or set --fail-swap-on flag to false. /proc/swaps contained: [Filename\t\t\t\tType\t\tSize\t\tUsed\t\tPriority /dev/zram0                              partition\t4194300\t\t0\t\t100]"

Can you please try again with --fail-swap-on=false?

travier commented 4 months ago

Did you set the config as specified in https://kubernetes.io/blog/2023/08/24/swap-linux-beta/ ?

Additionally, you must disable the failSwapOn configuration setting, or the deprecated --fail-swap-on command line flag must be deactivated.

fifofonix commented 4 months ago

Initial tests with 1.28.3 applying the kubelet config changes you linked @travier allows worker nodes to operate successfully with zram. Full disclosure I did not run the entire cluster with these changes meaning controller nodes did not have zram enabled but I think this test is still valid?

jlebon commented 4 months ago

In 1.30, we are turning the feature on by default but disabling swap usage (NoSwap). We did not recommend zram at the moment but that was mostly due to lack of support for memory based swap on most OSes.

@kannon92 To be really clear, in 1.30, the kubelet will successfully start on a node with swap, with the default config (i.e. without failSwapOn: false/--fail-swap-on=false), but pods will not make any use of swap?

Some things we do wonder about is if we use zwap, do we have to worry about setting any cgroups differently from memory.swap.max?

Can you clarify? Are you asking whether pods should have different memory.swap.max settings from the root cgroup?

kannon92 commented 4 months ago

@kannon92 To be really clear, in 1.30, the kubelet will successfully start on a node with swap, with the default config (i.e. without failSwapOn: false/--fail-swap-on=false), but pods will not make any use of swap?

Yes.

Some things we do wonder about is if we use zwap, do we have to worry about setting any cgroups differently from memory.swap.max?

Can you clarify? Are you asking whether pods should have different memory.swap.max settings from the root cgroup?

Forgive me, I misunderstood zswap and zram. I actually tested this on fedora 39 and everything looks to be good.