canonical / cloud-init

Official upstream for the cloud-init: cloud instance initialization
https://cloud-init.io/
Other
2.87k stars 856 forks source link

Use "groupadd --system" to create group #4603

Open osamuaoki opened 10 months ago

osamuaoki commented 10 months ago

Bug report

I noticed instance generated from Debian bookworm cloud image on linuxcontainer.org had odd GID=1000 for netdev. Since netdev should be a system group, this violates Debian policy https://www.debian.org/doc/debian-policy/ch-opersys.html#uid-and-gid-classes

100-999: Dynamically allocated system users and groups. Packages which need a user or group, but can have this user or group allocated dynamically and differently on each system, should use adduser --system to create the group and/or user. adduser will check for the existence of the user or group, and if necessary choose an unused id based on the ranges specified in adduser.conf.

I asked around to find the root cause of this problem:

The conclusion was:

It's listed as a default group in /etc/cloud/cloud.cfg which is the cloud-init config file.

I suspect cloud-init just goes through the list and creates anything missing, but doesn't know if something is meant to be a system group rather than user group.

As I read https://cloudinit.readthedocs.io/en/latest/reference/modules.html#users-and-groups , this functionality of creating group is intended for system group.

For creating system group, groupadd --system <groupname> command should be used instead of groupadd <groupname> .

Steps to reproduce the problem

On system with its /var on btrfs:

$ sudo apt install lxd
$ sudo lxd init --minimal
$ lxc launch images:debian/bookworm/cloud dbc
Creating dbc
Starting dbc
$ lxc exec dbc -- bash -l
root@dbc:~# tail /etc/group
systemd-journal:x:999:
systemd-network:x:998:
systemd-resolve:x:997:
input:x:102:
sgx:x:103:
kvm:x:104:
render:x:105:
_ssh:x:106:
netdev:x:1000:debian
debian:x:1001:
root@dbc:~# dpkg -l cloud-init
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name           Version      Architecture Description
+++-==============-============-============-========================================================
ii  cloud-init     22.4.2-1     all          initialization system for infrastructure cloud instances

Environment details

cloud-init logs

N/A

Possible fix

Quick reading of the source code, this issue may be resolved conceptually by applying the following patch (Please note I have not tested this.)

From 17fd94ee214d2e0784250b74144f0b236773c0cf Mon Sep 17 00:00:00 2001
From: Osamu Aoki <osamu@debian.org>
Date: Sat, 11 Nov 2023 12:06:07 +0900
Subject: [PATCH] Fix with --system

Signed-off-by: Osamu Aoki <osamu@debian.org>
---
 cloudinit/distros/__init__.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/cloudinit/distros/__init__.py b/cloudinit/distros/__init__.py
index 87390f63..72109ccd 100644
--- a/cloudinit/distros/__init__.py
+++ b/cloudinit/distros/__init__.py
@@ -1082,7 +1082,7 @@ class Distro(persistence.CloudInitPickleMixin, metaclass=abc.ABCMeta):
                     raise e

     def create_group(self, name, members=None):
-        group_add_cmd = ["groupadd", name]
+        group_add_cmd = ["groupadd", "--system", name]
         if util.system_is_snappy():
             group_add_cmd.append("--extrausers")
         if not members:
-- 
2.39.2

Concerns

The above patch didn't consider impacts and compatibility to some Ubuntu environment addressed by the subsequent if util.system_is_snappy() condition. This -extrausers seems to be a Ubuntu specific feature.

It looks like create_group is only used from cc_users_groups.py for processing groups. If other parts use this too, their impacts need to be assessed.

Possible benefits for existing issue reports

I see some issues reported around this here:

These issues are practically resolved by making proper system group ID assignment with the proposed fix because they are looking for GID=100-999 for system group. Then odd situation of primary user having UID=1000 GID=1001is avoided.

osamuaoki commented 10 months ago

As I checked ordinary Ubuntu 22.04.3 LTS (Jammy Jellyfish), the --extrausers option is documented only as Use the extra users database in the groupadd -h output and not mentioned even in its manpage.

Since I don't know how this extension is used by Ubuntu internal, it looks like some parameters are passsed etc. So my casual test caused following output, the sequnce of command putting --extrausers option at the end of command line may be OK even as proposed by my patch.

root@noted-cat:/usr/share/man/man8# groupadd --system XXX --extrausers
ENTER EXTRAUSERS_GROUP_FILEEXIT EXTRAUSERS_GROUP_FILEENTER EXTRAUSERS_SHADOWGROUP_FILEEXIT EXTRAUSERS_SHADOWGROUP_FILE
groupadd: /var/lib/extrausers/group.2019: No such file or directory
groupadd: cannot lock /var/lib/extrausers/group; try again later.
osamuaoki commented 10 months ago

As I think of the point of assigning GID for these system group in advance by cloud-init, the driver of this functionality is to have consistent GID number for the group name across installed images. So similar installation with some package creating these system group has the same GID and easy to manage. So this functionality of groupadd needs to use --system option.

osamuaoki commented 10 months ago

I see in cloudinit/distros/__init__.py:

        # Check if group exists, and then add it doesn't
        if util.is_group(name):
            LOG.warning("Skipping creation of existing group '%s'", name)
        else:
            try:
                subp.subp(group_add_cmd)
                LOG.info("Created new group %s", name)
            except Exception:
                util.logexc(LOG, "Failed to create group %s", name)

This LOG.warning caught my eyes. The initial significant commit for this part seems to be:

This mention:

    Added "userless" mode to cloud-init for handling the creation of the
    users and the default user on Ubuntu.

    cloudinit/config/cc_users_groups.py: new cloud-config module for creating
        users and groups on instance initialization.
        - Creates users and group
        - Sets "user" directive used in ssh_import_id

    cloudinit/config/cc_ssh_import_id.py: module will rely upon users_groups
        for setting the default user. Removed assumption of 'ubuntu' user.

    cloudinit/distros/__init__.py: Added new abstract methods for getting
        and creating the default user.

    cloudinit/distros/ubuntu.py: Defined abstract methods for getting and
        and creating the default 'ubuntu' user on Ubuntu instances.

    cloudinit/util.py: Added ability to hide command run through util.subp to
        prevent the commands from showing in the logs. Used by user_groups
        cloud-config module.

    config/cloud.cfg: Removed "user: ubuntu" directive and replaced with new
        user-less syntax.

    doc/examples/cloud-config.txt: Documented the creation of users and groups.

So the warning is only meant that system image before cloud-init already set groups listed. Not really any serouous warning. Just telling us that these can be dropped. No special issues mentioned.

osamuaoki commented 10 months ago

Hi,

I finally got around to read the log:

023-11-04 15:27:53,459 - subp.py[DEBUG]: Running command ['groupadd', 'netdev'] with allowed return codes [0] (shell=False, capture=True)
2023-11-04 15:27:53,476 - __init__.py[INFO]: Created new group netdev
2023-11-04 15:27:53,476 - __init__.py[DEBUG]: created group 'netdev' for user 'debian'
2023-11-04 15:27:53,476 - __init__.py[DEBUG]: Adding user debian
2023-11-04 15:27:53,476 - subp.py[DEBUG]: Running hidden command to protect sensitive input/output logstring: ['useradd', 'debian', '--comment', 'Debian', '--groups', 'adm,audio,cdrom,dialout,dip,floppy,netdev,plugdev,sudo,video', '--shell', '/bin/bash', '-m']
2023-11-04 15:27:53,511 - subp.py[DEBUG]: Running command ['passwd', '-l', 'debian'] with allowed return codes [0] (shell=False, capture=True)
2023-11-04 15:27:53,522 - util.py[DEBUG]: Reading from /etc/sudoers (quiet=False)
2023-11-04 15:27:53,522 - util.py[DEBUG]: Read 1714 bytes from /etc/sudoers
2023-11-04 15:27:53,523 - util.py[DEBUG]: Writing to /etc/sudoers.d/90-cloud-init-users - wb: [440] 124 bytes

Yes, the initial invocation of cloud-init added netdev causing its GID to be 1000

osamuaoki commented 10 months ago

Here is the fresh log for only starting a new Debian cloud image. cloud-init.log.gz

dermotbradley commented 10 months ago

I noticed instance generated from Debian bookworm cloud image on linuxcontainer.org had odd GID=1000 for netdev. Since netdev should be a system group, this violates Debian policy

Firstly, what document says that "netdev" on Debian should be a system group?

Secondly, why does Debian not already have the "netdev" group present if it requires it as a system group?

Cloud-init on Debian is indeed creating the "netdev" group if it does not exist because /etc/cloud/cloud.cfg lists it as a required group for Debian (the cloud.cfg file is created from the cloud.cfg.tmpl file whenever cloud-init is packaged for Debian: https://github.com/canonical/cloud-init/blob/main/config/cloud.cfg.tmpl#L15)

Cloud-init's users_groups module currently does not provide a method to create system groups, any groups defined either in /etc/cloud/cloud.cfg or in user-data will be created as "ordinary" groups.

The 2 cloud-init issues you referenced are quite old and it seems they were treated as low priority issues at the time.

this functionality of creating group is intended for system group

I see nothing in the cloud-init documentation that states this. I guess you are interpreting "Groups to add to the system..." to mean this but that's not what it means (it is talking about groups to add to the (operating) system, not groups to add as system groups).

Your suggested code change for groups to all be created via "addgroup --system" or "groupadd --system" is not correct. Personally when I often specify groups (in user-data) to be created I do not want them to be system groups.

Rather what would be needed would be for the users_groups module's schema and code for groups to be extended to either enable individual groups to be optionally defined as system groups (where cloud-init automatically picks a gid within the system group range), or where the ability to define a specific "gid" to be used is available (as some distros-specific system groups are expected to have specific gids), or implement both of these solutions.

I assume the reason you're only seeing this issue for "netdev" is that it is the only one of the groups listed in cloud.cfg that don't already exist in Debian.

An alternative solution for cloud-init would be to simple remove "netdev" from that list, Debian is the only OS/distro in cloud.cfg.tmpl where "netdev" is specified.

osamuaoki commented 10 months ago

Hi,

On Mon, 2023-11-13 at 19:44 -0800, dermotbradley wrote:

I noticed instance generated from Debian bookworm cloud image on linuxcontainer.org had odd GID=1000 for netdev. Since netdev should be a system group, this violates Debian policy Firstly, what document says that "netdev" on Debian should be a system group? Hmm... I didn't expect this question.  I almost said it's a common sense.  It's true you don't see such "should" words specifically for netdev  in policy.

Let me try to prove my point.   I found a reference for this system group in Debian policy.   https://www.debian.org/doc/debian-policy/ch-opersys.html#users-and-groups This generic definitions in the policy text and de-facto practices across Debain packages on netdev makes netdev dynamically assigned system group.

Here is a packaging practice of Debian packages:

 $ cd /var/lib/dpkg/info
 $ rg -e "group.*netdev"
ifupdown.postinst
11: addgroup --quiet --system netdev || true

network-manager.postinst
23: # Create netdev group. Members of group netdev get access to the PolicyKit
action
25: addgroup --quiet --system netdev

avahi-daemon.postinst
24: addgroup --quiet --system netdev || true

wpasupplicant.postinst
23: if ! getent group netdev >/dev/null; then
24: addgroup --quiet --system netdev || true

To recap, there are 2 types of system groups.   System groups with statically assigned GID and ones with dynamically assigned GID.

If you install normal full desktop system, Debian has netdev as a dynamically assigned system group. 

Since linuxcontainers.org image is very minimal one with good reason, it correctly doesn't install none of the above listed packages into its base nor cloud images.  So netdev group doesn't exist in its /etc/group .

Cloud-init on Debian is indeed creating the "netdev" group if it does not exist because /etc/cloud/cloud.cfg lists it as a required group for Debian (the cloud.cfg file is created from the cloud.cfg.tmpl file whenever cloud- init is packaged for Debian: https://github.com/canonical/cloud-init/blob/main/config/cloud.cfg.tmpl#L15) Cloud-init's users_groups module currently does not provide a method to create system groups, any groups defined either in /etc/cloud/cloud.cfg or in user- data will be created as "ordinary" groups.

I was not expecting you to insist that the design goal of users_groups is to add  non-existing group as "ordinary" groups in this early phase of system installation, 

Also looking into cloud-init source cofig/cloud.cfg.tmpl:

{% set groups = ({"alpine": "adm, wheel", "arch": "wheel, users",
  "debian": "adm, audio, cdrom, dialout, dip, floppy, netdev, plugdev, sudo,
video", 
  "gentoo": "users, wheel", "mariner": "wheel", 
  "photon": "wheel", 
  "openmandriva": "wheel, users, systemd-journal", 
  "suse": "cdrom, users", 
  "ubuntu": "adm, cdrom, dip, lxd, sudo", 
  "unknown": "adm, cdrom, dip, lxd, sudo"}) %}

All these actual use cases in the source made me to bereave this "groups" was intended to creating system group.

lxd defined for Ubuntu is defenately dynamic assigned one and likely to cause similar problem . FYI: here is postinst for lxd (this is on Debian but Ubuntu should be the same):

$ rg -e "group.*lxd" /var/lib/dpkg/info
/var/lib/dpkg/info/lxd.preinst
6: if ! getent group lxd >/dev/null; then
7: addgroup --system lxd
12: adduser --force-badname --system --home /var/lib/lxd/ --shell /bin/false --
ingroup lxd _lxd

The 2 cloud-init issues you referenced are quite old and it seems they were treated as low priority issues at the time.

this functionality of creating group is intended for system group I see nothing in the cloud-init documentation that states this. I guess you are interpreting "Groups to add to the system..." to mean this but that's not what it means (it is talking about groups to add to the (operating) system, not groups to add as system groups). Yes, this is sort of true.  That section doesn't explicitly describe the use case. Your suggested code change for groups to all be created via "addgroup -- system" or "groupadd --system" is not correct. Personally when I often specify groups (in user-data) to be created I do not want them to be system groups. If this is the intended use case of this "groups" for you, please document it as so.  Also, please change the current use of "groups" in the template to a new one to set the system group.  Rather what would be needed would be for the users_groups module's schema and code for groups to be extended to either enable individual groups to be optionally defined as system groups (where cloud-init automatically picks a gid within the system group range), or where the ability to define a specific "gid" to be used is available (as some distros-specific system groups are expected to have specific gids), or implement both of these solutions. If you wish to keep current behavior for backward compatibility, this is a good path. I assume the reason you're only seeing this issue for "netdev" is that it is the only one of the groups listed in cloud.cfg that don't already exist in Debian. If anyone creates image without lxd in it for Ubuntu, they hit the same problem. An alternative solution for cloud-init would be to simple remove "netdev" from that list, Debian is the only OS/distro in cloud.cfg.tmpl where "netdev" is specified. Yes.  I am working around this problem by removing netdev from my local image.

Osamu

osamuaoki commented 10 months ago

My main concern is UID/GID skew for 1000 after cloud-init execution.

I agree not to change behavior here after I read cloud-init document for "Users and Groups" and verified that it mentions nothing about "system group" creation.

As I see ansible, it may be good idea to have "ansible.builtin.group module" equivalent so we can ensure system group creation if such group doesn't exist yet. Then cloud.cfg can be written not to create strange GID for netdev and by creating netdev system group before setting "users:".

To recap ansible situation:

TheRealFalcon commented 10 months ago

@osamuaoki thanks for the report here. The functionality you are requesting makes sense to add, and we would accept it into cloud-init, but I also realistically don't think it is something we would prioritize anytime soon. If you would like to contribute this functionality, PRs are always welcome!

dermotbradley commented 10 months ago

@TheRealFalcon

As the existing functionality does not create system groups then there is an issue with the groups defined in cloud.cfg.tmpl - these mainly/all appear to be groups that are intended to be system groups on the relevant distros.

I guess that these should be be removed from cloud.cfg.tmpl as these groups should already exist and, if not, cloud-init shouldn't create them as non system groups.

osamuaoki commented 9 months ago

I thought about if cloud-init has to update cloud.cfg.tmpl to make it a good default.

It depends on what kind of "Debian" install image cloud-init targets.

Any normal Debian system installations offering standard Unix-like features have ifupdown package which creates netdev group. So current cloud.cfg.tmpl is OK.

Very minimal Debian system images distributed by LXD mostly consisting of priority=required packages (without priority=standard packages) don't have ifupdown package and suffer problem mentioned.

So the choice depends on which system cloud-init target as the primary one.

Whatever values are listed in groups of the cloud.cfg, image generation program can override it, I think.

I think the fixing of Debian image distributed by LXD should be done primarily in its generation code (https://github.com/lxc/distrobuilder ?).

Also, end user can work with current distributed image by supplying profile with corrective groups value. (Yes, ugly but it works.)

holmanb commented 9 months ago

Your suggested code change for groups to all be created via "addgroup --system" or "groupadd --system" is not correct. Personally when I often specify groups (in user-data) to be created I do not want them to be system groups.

@dermotbradley Agreed, simply adding --system would not be an appropriate solution.

Rather what would be needed would be for the users_groups module's schema and code for groups to be extended to either enable individual groups to be optionally defined as system groups (where cloud-init automatically picks a gid within the system group range), or where the ability to define a specific "gid" to be used is available (as some distros-specific system groups are expected to have specific gids), or implement both of these solutions.

@dermotbradley Yep, I think that this is the right path forward.

I guess that these should be be removed from cloud.cfg.tmpl as these groups should already exist and, if not, cloud-init shouldn't create them as non system groups.

@dermotbradley I don't think this makes sense. This list isn't primarily intended to define groups to add to the system (though as of https://github.com/canonical/cloud-init/commit/0865e5179f2a803f727e83b5e36681613e63fe8a the implemented was changed to add groups that didn't already exist), but rather this list is supposed to be a list of groups that the default user gets added to. Removing these groups, as you suggest would remove permissions from the default user. I don't think we want that.

Yes. I am working around this problem by removing netdev from my local image.

@osamuaoki You could also set create_groups to false to disable this behavior, see the linked commit.