comet-ml / issue-tracking

Questions, Help, and Issues for Comet ML
https://www.comet.ml
85 stars 9 forks source link

COMET DEBUG: No relevant cgroup controllers mounted. #542

Open mhnazeri opened 2 months ago

mhnazeri commented 2 months ago

Describe the Bug

Running CometML in Pop_OS 22.04 causes this weird error. The same code runs without a problem on Fedora 39. I'm not using docker, just python venv.

Expected behavior

Running experiment logging.

Where is the issue?

To Reproduce

Steps to reproduce the behavior:

  1. Integrate comet with pytorch code to log the data
  2. See error

Stack Trace

At first it shows these warnings:

2024-04-16 13:48:39,758 COMET DEBUG: Reading cgroups info from: /proc/cgroups
2024-04-16 13:48:39,758 COMET DEBUG: #subsys_name   hierarchy   num_cgroups enabled

2024-04-16 13:48:39,758 COMET DEBUG: cpuset 0   224 1

2024-04-16 13:48:39,758 COMET DEBUG: cpu    0   224 1

2024-04-16 13:48:39,758 COMET DEBUG: cpuacct    0   224 1

2024-04-16 13:48:39,758 COMET DEBUG: blkio  0   224 1

2024-04-16 13:48:39,758 COMET DEBUG: memory 0   224 1

2024-04-16 13:48:39,758 COMET DEBUG: devices    0   224 1

2024-04-16 13:48:39,758 COMET DEBUG: freezer    0   224 1

2024-04-16 13:48:39,758 COMET DEBUG: net_cls    0   224 1

2024-04-16 13:48:39,758 COMET DEBUG: perf_event 0   224 1

2024-04-16 13:48:39,758 COMET DEBUG: net_prio   0   224 1

2024-04-16 13:48:39,758 COMET DEBUG: hugetlb    0   224 1

2024-04-16 13:48:39,758 COMET DEBUG: pids   0   224 1

2024-04-16 13:48:39,759 COMET DEBUG: rdma   0   224 1

2024-04-16 13:48:39,759 COMET DEBUG: misc   0   224 1

2024-04-16 13:48:39,759 COMET DEBUG: is_cgroupsV2=True
2024-04-16 13:48:39,759 COMET DEBUG: Reading self cgroups info from: /proc/self/cgroup
2024-04-16 13:48:39,759 COMET DEBUG: 0::/user.slice/user-1000.slice/user@1000.service/app.slice/app-org.gnome.Terminal.slice/vte-spawn-bffe3c2b-c664-439c-b74f-dde8231f07ae.scope

2024-04-16 13:48:39,759 COMET DEBUG: Reading mountinfo from: /proc/self/mountinfo
2024-04-16 13:48:39,759 COMET DEBUG: 25 32 0:23 / /sys rw,nosuid,nodev,noexec,relatime shared:7 - sysfs sysfs rw

2024-04-16 13:48:39,759 COMET DEBUG: 26 32 0:24 / /proc rw,nosuid,nodev,noexec,relatime shared:13 - proc proc rw

2024-04-16 13:48:39,759 COMET DEBUG: 27 32 0:5 / /dev rw,nosuid,relatime shared:2 - devtmpfs udev rw,size=32518228k,nr_inodes=8129557,mode=755,inode64

2024-04-16 13:48:39,759 COMET DEBUG: 28 27 0:25 / /dev/pts rw,nosuid,noexec,relatime shared:3 - devpts devpts rw,gid=5,mode=620,ptmxmode=000

2024-04-16 13:48:39,759 COMET DEBUG: 29 32 0:26 / /run rw,nosuid,nodev,noexec,relatime shared:5 - tmpfs tmpfs rw,size=6512340k,mode=755,inode64

2024-04-16 13:48:39,759 COMET DEBUG: 30 25 0:27 / /sys/firmware/efi/efivars rw,nosuid,nodev,noexec,relatime shared:8 - efivarfs efivarfs rw

2024-04-16 13:48:39,759 COMET DEBUG: 32 1 259:3 / / rw,noatime shared:1 - ext4 /dev/nvme0n1p3 rw,errors=remount-ro

2024-04-16 13:48:39,759 COMET DEBUG: 33 25 0:6 / /sys/kernel/security rw,nosuid,nodev,noexec,relatime shared:9 - securityfs securityfs rw

2024-04-16 13:48:39,759 COMET DEBUG: 34 27 0:29 / /dev/shm rw,nosuid,nodev shared:4 - tmpfs tmpfs rw,inode64

2024-04-16 13:48:39,759 COMET DEBUG: 35 29 0:30 / /run/lock rw,nosuid,nodev,noexec,relatime shared:6 - tmpfs tmpfs rw,size=5120k,inode64

2024-04-16 13:48:39,759 COMET DEBUG: 36 25 0:31 / /sys/fs/cgroup rw,nosuid,nodev,noexec,relatime shared:10 - cgroup2 cgroup2 rw,nsdelegate,memory_recursiveprot

2024-04-16 13:48:39,759 COMET DEBUG: 37 25 0:32 / /sys/fs/pstore rw,nosuid,nodev,noexec,relatime shared:11 - pstore pstore rw

2024-04-16 13:48:39,759 COMET DEBUG: 38 25 0:33 / /sys/fs/bpf rw,nosuid,nodev,noexec,relatime shared:12 - bpf bpf rw,mode=700

2024-04-16 13:48:39,759 COMET DEBUG: 39 26 0:34 / /proc/sys/fs/binfmt_misc rw,relatime shared:14 - autofs systemd-1 rw,fd=29,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=20163

2024-04-16 13:48:39,759 COMET DEBUG: 40 27 0:20 / /dev/mqueue rw,nosuid,nodev,noexec,relatime shared:15 - mqueue mqueue rw

2024-04-16 13:48:39,759 COMET DEBUG: 41 27 0:35 / /dev/hugepages rw,relatime shared:16 - hugetlbfs hugetlbfs rw,pagesize=2M

2024-04-16 13:48:39,759 COMET DEBUG: 42 25 0:7 / /sys/kernel/debug rw,nosuid,nodev,noexec,relatime shared:17 - debugfs debugfs rw

2024-04-16 13:48:39,759 COMET DEBUG: 43 25 0:12 / /sys/kernel/tracing rw,nosuid,nodev,noexec,relatime shared:18 - tracefs tracefs rw

2024-04-16 13:48:39,759 COMET DEBUG: 44 25 0:36 / /sys/fs/fuse/connections rw,nosuid,nodev,noexec,relatime shared:19 - fusectl fusectl rw

2024-04-16 13:48:39,759 COMET DEBUG: 45 25 0:21 / /sys/kernel/config rw,nosuid,nodev,noexec,relatime shared:20 - configfs configfs rw

2024-04-16 13:48:39,760 COMET DEBUG: 68 29 0:37 / /run/credentials/systemd-sysusers.service ro,nosuid,nodev,noexec,relatime shared:21 - ramfs ramfs rw,mode=700

2024-04-16 13:48:39,760 COMET DEBUG: 93 32 259:2 / /recovery rw,relatime shared:31 - vfat /dev/nvme0n1p2 rw,fmask=0077,dmask=0077,codepage=437,iocharset=iso8859-1,shortname=mixed,errors=remount-ro

2024-04-16 13:48:39,760 COMET DEBUG: 96 32 259:1 / /boot/efi rw,relatime shared:47 - vfat /dev/nvme0n1p1 rw,fmask=0077,dmask=0077,codepage=437,iocharset=iso8859-1,shortname=mixed,errors=remount-ro

2024-04-16 13:48:39,760 COMET DEBUG: 99 39 0:38 / /proc/sys/fs/binfmt_misc rw,nosuid,nodev,noexec,relatime shared:49 - binfmt_misc binfmt_misc rw

2024-04-16 13:48:39,760 COMET DEBUG: 1023 29 0:55 / /run/user/1000 rw,nosuid,nodev,relatime shared:573 - tmpfs tmpfs rw,size=6512336k,nr_inodes=1628084,mode=700,uid=1000,gid=1000,inode64

2024-04-16 13:48:39,760 COMET DEBUG: 830 1023 0:57 / /run/user/1000/gvfs rw,nosuid,nodev,relatime shared:546 - fuse.gvfsd-fuse gvfsd-fuse rw,user_id=1000,group_id=1000

2024-04-16 13:48:39,760 COMET DEBUG: 1082 1023 0:58 / /run/user/1000/doc rw,nosuid,nodev,relatime shared:582 - fuse.portal portal rw,user_id=1000,group_id=1000

2024-04-16 13:48:39,760 COMET DEBUG: No relevant cgroup controllers mounted.
2024-04-16 13:48:39,760 COMET DEBUG: CGROUP container detection failed, exception=Required cgroup subsystem files not found

Screenshots or GIFs

After that warning, all I see is this: Screenshot from 2024-04-16 13-39-19

dsblank commented 2 months ago

Interesting... @mhnazeri can you provide a small bit of code that demonstrates this? Or provide a link to a Comet experiment?

mhnazeri commented 2 months ago

I made this repo public that produces that specific output on Pop_Os! 22.04. I should mention that this code runs fine on Fedora 39. I suspect it might be an issue with a package (maybe related to croups) but I installed everything related to cgroups but it didn't help. I also don't know why it needs something like that.

To run the code from the repo just put a few images in the data folder and run python run.py. Also make sure that the debug flag in the config file is False, otherwise it disables comet. All the config file for the comet are residing here.

dsblank commented 2 months ago

@mhnazeri thank you for the reproducable info! I'll pass this on to the engineering team.