SUSE / cpuset

Originally exported from code.google.com/p/cpuset, then maintained at github.com/lpechacek/cpuset, then migrated the repo here SUSE/cpuset
GNU General Public License v2.0
111 stars 27 forks source link

cpuset not working with present arch linux (maybe cgroupv2?) #40

Open ebennett1980 opened 3 years ago

ebennett1980 commented 3 years ago

Problem seems to afflict arch linux on two different kernels, I think it has to do with them using control group v2 and cpuset expecting v1 perhaps? Arch upgraded in a recent update.

symptom looks like this; root@monolith:~# cset shield --cpu=0-7 mount: /cpusets: none already mounted on /sys/fs/bpf. cset: **> mount of cpuset filesystem failed, do you have permission?

Two kernels confirmed affected;

Linux monolith 5.4.85-1-vfio-lts #1 SMP Wed, 23 Dec 2020 06:46:51 +0000 x86_64 GNU/Linux Linux magister 5.9.11-xanmod1-1 #1 SMP PREEMPT Tue, 01 Dec 2020 12:38:55 +0000 x86_64 GNU/Linux

Werkov commented 3 years ago

Hello. What does the following say on your system:

grep -E "cpuset|cgroup2" /proc/mounts
cat /sys/fs/cgroup/cgroup.controllers  # edit: or wherever your cgroup2 tree is moutned

?

Also, what systemd version is this?

(I'm just checking, but I'd generally not expect cpuset working with v2 cpuset controller.)

ebennett1980 commented 3 years ago

Results on the metal cgroup2 /sys/fs/cgroup cgroup2 rw,nosuid,nodev,noexec,relatime 0 0 cpuset cpu io memory hugetlb pids rdma

Results in a VM cgroup2 /sys/fs/cgroup cgroup2 rw,nosuid,nodev,noexec,relatime 0 0 cpuset cpu io memory hugetlb pids rdma

systemd is 248-5 on both

Werkov commented 3 years ago

Thanks. So the cpuset controller is bound to the v2 hierarchy and it also shows that systemd runs in the unified mode (that was perhaps the critical change between updates). I'm afraid the cset utility can't serve you in such a setup.

(As a workaround, you may switch via kernel cmdline back to the hybrid setup systemd.unified_cgroup_hierarchy=0 and use cset as previously, or migrate your configuration to systemd cpuset implementation (I didn't test it).)

Hubro commented 3 years ago

@Werkov Do you have any advice on how to shield CPU cores now, with the new unified cgroup hierarchy? Does systemd have any functionality for this? I am unable to find any information about this.

Werkov commented 3 years ago

@Hubro With the recent systemd version you should be able to specify AllowedCPUs= directive. You can't set it directly on root (-.slice) but all 1st level children instead (init.scope, system.slice and user.slice in default setups). That way you can move the userspace tasks out of the way. (Note that kernel threads can still run on "shielded" CPUs but that is no different to cset shield --kthread=off(the default).)

Hubro commented 3 years ago

@Werkov Is it possible to set AllowedCPUs for already running slices without restarting them? Also, is there any way to inform the kernel which cores I want it to keep its threads on? In this case a kernel command line argument would be fine.

EDIT: I figured out how I set AllowedCPUs at runtime:

sudo systemctl set-property --runtime user.slice AllowedCPUs=0-3
sudo systemctl set-property --runtime system.slice AllowedCPUs=0-3

This didn't do anything for me, but I assume that's because I disabled unified cgroup hierarchy. I'll test this out later with unified hierarchy enabled.

I still have no idea how to keep kernel threads off my virtualization cores though.

Werkov commented 3 years ago

Setting the cgroup attributes should work at runtime exactly as you did (alternatively you can edit slice unit or drop-in files and call systemctl daemon-reload), restart of slices is not necessary. (One possible catch with the runtime update is that NUMA memory won't be migrated with the change.) And you need the unified hierarchy for this to work with systemd (otherwise you'd have used cset, right?).

If you need "silence" on the cpu, then see isolcpus kernel cmdline. I'm just curious, why you need to keep kernel threads off your selected cores (is that due to RT constraints)?

Hubro commented 3 years ago

@Werkov My use case is a high performance virtual machine doing realtime tasks, so I'm doing everything I can to reduce latency and stutters on those cores.

I'm not entirely sure how isolcpus works. Will this keep kernel threads off those cores, or will it keep everything off them? I want to be able to do compilation and encoding tasks using all my host cores when my VMs are not running, so any kernel cmdline parameters that prevent that is not an option for me.


I just noticed the docs you linked says that isolcpus is deprecated:

isolcpus=       [KNL,SMP,ISOL] Isolate a given set of CPUs from disturbance.
                        [Deprecated - use cpusets instead]
                        Format: [flag-list,]<cpu-list>

Does that mean that cpusets can do all the things that isolcpus can do? :thinking:

fweisbec commented 3 years ago

Does that mean that cpusets can do all the things that isolcpus can do? thinking

Not exactly, isolcpus is often used to disable scheduler load balancing on a CPU and that's the only part where cpusets can help in a similar fashion (through cpuset.sched_load_balance), that and also telling which task is allowed to go on a given set of CPUs. But that's where the similarity ends.

In fact isolcpus does much more as it also isolates from unbound kernel threads, workqueues, timers, etc...

I understand you can't afford to use boot parameters but it's worth being aware of "nohz_full=". It will isolate your CPU pretty much as well as isolcpus does and it will also further deactivate the timer tick on the host CPU, avoiding interrupting the guest vCPUs. If you combine that with cpusets to move all unrelated tasks from the CPUs running the guests, you might have good results. Oh and don't forget to re-affine interrupts out of the CPUs running guest as well (https://www.kernel.org/doc/Documentation/IRQ-affinity.txt). I'm writing a series of articles about that if that can help: https://www.suse.com/c/cpu-isolation-introduction-part-1/

lpechacek commented 3 years ago

Thanks, @fweisbec, for the comment and recommendation about nohz_full. I'd recommend the same for the purpose of ensuring undisturbed VM run.

Ad cpuset v2 cgroup controller in general, I'll dump my current thoughts here. It's not going to be a neatly formulated message but I'll be grateful for alternate views and opinions.

1) (AFAIK) Cpuset utility was created in the pre-systemd era as part of the Novell/SUSE SLERT offering. It is quite a nice utility with reasonably good code. 2) When the cpuset author left the company, I took over the utility maintenance because patches started to pile up in the package. I didn't know that there are that many users outside the company, specifically because I don't recall receiving any feedback from other distro maintainers about the changes upstream. 3) With the introduction of SystemD in SUSE products I heard horror stories about how SystemD freezes when external programs manipulate its group setting. It was LTP at that time taking SystemD down in product testing. 4) I noticed the introduction of the cgroup v2 hierarchy, briefly discussed it with our cgroup expert and put the v2 hierarchy support on my "look into it when there's spare time" list. 5) The introduction of the cpuset controller in the v2 hierarchy made me recall the sleeping task. Given my beliefs about incompatibility with SystemD, I thought that cpuset might be helpful to inspect the hierarchy but perhaps should not alter the system settings. The process CPU scheduling options should perhaps be controlled with systemd-run or something like that. I haven't tried myself yet, but that's where I would start my search.

At this point, I'd like to know your opinion(s) about whether it is safe to manipulate systems cgroup settings without the daemon's consent. If you have any further comments, feel free to drop them here as well. Thanks!

Werkov commented 3 years ago
3. With the introduction of SystemD in SUSE products I heard horror stories about how SystemD freezes when external programs manipulate its group setting. It was LTP at that time taking SystemD down in product testing.

Fortunately, this is irrelevant for cgroup hierarchies that are not managed by systemd. Practically, the cpuset utility could be used safely with systemd prior v244 (which introduced support for cpuset in systemd (edit: therefore cpuset hierarchy was unmanaged by systemd in the older versions)).

At this point, I'd like to know your opinion(s) about whether it is safe to manipulate systems cgroup settings without the daemon's consent.

Nowadays with v2, it is safe when the operations are carried out in a dedicated subtree only. Full description is in the document about cgroup delegation with systemd.

haelix888 commented 3 years ago

Has anyone managed to set isolcpus on Arch using EFISTUB (efibootmgr) by any chance? The kernel parameter is not getting picked up. (linux-lts 5.10.40)

Edit: possibly related: https://lore.kernel.org/lkml/20200414215715.GB182757@xz-x1/T/#u

fweisbec commented 3 years ago

Has anyone managed to set isolcpus on Arch using EFISTUB (efibootmgr) by any chance? The kernel parameter is not getting picked up. (linux-lts 5.10.40)

Edit: possibly related: https://lore.kernel.org/lkml/20200414215715.GB182757@xz-x1/T/#u

No idea but you can still hardcode kernel boot option with CONFIG_CMDLINE_BOOL + CONFIG_CMDLINE

haelix888 commented 3 years ago

My issue was with the BIOS. It somehow deduplicates boot entries having the same image, but different parameters. In my case I can get it to work by entering bios and modifying boot priority order (even if efibootmgr correctly reports the order).

joshuaboniface commented 2 years ago

Just chiming in here, I'm looking to use cset on Debian 11 which, by default, leverages the unified cgroup heirarchy. While disabling the unified heirarchy is of course feasible and did work for me, I'd be concerned about the long-term implications of this, especially when Debian 12 drops with who-knows-what-other-changes in systemd, cgroups, etc.

Is there currently any plan to support the unified hierarchy?

I ask because, while the systemd unit option might be useful in some cases, in my case I'm using cset to fully isolate one process to its own set of CPUs. By leveraging cset and its automated move of processes into another set, this is pretty trivial - I move everything into a new cset with cset proc --move --force, and then use cset proc to put my new processes into their own cset. But trying to update every systemd unit to exclude them from executing on those CPUs would not be trivial. I'd be curious if anyone else has an alternative for this if indeed cset isn't going to support the unified hierarchy long-term. No rush of course I have at least 2 years until it could potentially become a problem, but wanted to get ahead of it ;-)

Werkov commented 2 years ago

But trying to update every systemd unit to exclude them from executing on those CPUs would not be trivial.

Actually, you should just be able to leverage the hierarchy and apply cpuset systemd settings on the top-level units only (system.slice, user.slice, machine.slice, init.scope by default).

joshuaboniface commented 2 years ago

Actually, you should just be able to leverage the hierarchy and apply cpuset systemd settings on the top-level units only (system.slice, user.slice, machine.slice, init.scope by default).

Interesting, that would definitely suit my needs, I'll give it a shot. I hadn't considered setting it at the slice level!