Closed mrmateo closed 4 years ago
Hmn, ok. I haven't tried that yet. Can you maybe show me an easy way to reproduce your error?
Sure! Here is an example -- let me know if you would like any additional info about my environment although here are some basics:
matt@mateopc:~$ uname -a
Linux mateopc 5.4.17-200.fc31.x86_64 #1 SMP Sat Feb 1 19:00:13 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
matt@mateopc:~$ cat /etc/fedora-release
Fedora release 31 (Thirty One)
matt@mateopc:~$ lxd --version
3.20
matt@mateopc:~$ lxc info
config: {}
api_extensions:
- storage_zfs_remove_snapshots
- container_host_shutdown_timeout
- container_stop_priority
- container_syscall_filtering
- auth_pki
- container_last_used_at
- etag
- patch
- usb_devices
- https_allowed_credentials
- image_compression_algorithm
- directory_manipulation
- container_cpu_time
- storage_zfs_use_refquota
- storage_lvm_mount_options
- network
- profile_usedby
- container_push
- container_exec_recording
- certificate_update
- container_exec_signal_handling
- gpu_devices
- container_image_properties
- migration_progress
- id_map
- network_firewall_filtering
- network_routes
- storage
- file_delete
- file_append
- network_dhcp_expiry
- storage_lvm_vg_rename
- storage_lvm_thinpool_rename
- network_vlan
- image_create_aliases
- container_stateless_copy
- container_only_migration
- storage_zfs_clone_copy
- unix_device_rename
- storage_lvm_use_thinpool
- storage_rsync_bwlimit
- network_vxlan_interface
- storage_btrfs_mount_options
- entity_description
- image_force_refresh
- storage_lvm_lv_resizing
- id_map_base
- file_symlinks
- container_push_target
- network_vlan_physical
- storage_images_delete
- container_edit_metadata
- container_snapshot_stateful_migration
- storage_driver_ceph
- storage_ceph_user_name
- resource_limits
- storage_volatile_initial_source
- storage_ceph_force_osd_reuse
- storage_block_filesystem_btrfs
- resources
- kernel_limits
- storage_api_volume_rename
- macaroon_authentication
- network_sriov
- console
- restrict_devlxd
- migration_pre_copy
- infiniband
- maas_network
- devlxd_events
- proxy
- network_dhcp_gateway
- file_get_symlink
- network_leases
- unix_device_hotplug
- storage_api_local_volume_handling
- operation_description
- clustering
- event_lifecycle
- storage_api_remote_volume_handling
- nvidia_runtime
- container_mount_propagation
- container_backup
- devlxd_images
- container_local_cross_pool_handling
- proxy_unix
- proxy_udp
- clustering_join
- proxy_tcp_udp_multi_port_handling
- network_state
- proxy_unix_dac_properties
- container_protection_delete
- unix_priv_drop
- pprof_http
- proxy_haproxy_protocol
- network_hwaddr
- proxy_nat
- network_nat_order
- container_full
- candid_authentication
- backup_compression
- candid_config
- nvidia_runtime_config
- storage_api_volume_snapshots
- storage_unmapped
- projects
- candid_config_key
- network_vxlan_ttl
- container_incremental_copy
- usb_optional_vendorid
- snapshot_scheduling
- container_copy_project
- clustering_server_address
- clustering_image_replication
- container_protection_shift
- snapshot_expiry
- container_backup_override_pool
- snapshot_expiry_creation
- network_leases_location
- resources_cpu_socket
- resources_gpu
- resources_numa
- kernel_features
- id_map_current
- event_location
- storage_api_remote_volume_snapshots
- network_nat_address
- container_nic_routes
- rbac
- cluster_internal_copy
- seccomp_notify
- lxc_features
- container_nic_ipvlan
- network_vlan_sriov
- storage_cephfs
- container_nic_ipfilter
- resources_v2
- container_exec_user_group_cwd
- container_syscall_intercept
- container_disk_shift
- storage_shifted
- resources_infiniband
- daemon_storage
- instances
- image_types
- resources_disk_sata
- clustering_roles
- images_expiry
- resources_network_firmware
- backup_compression_algorithm
- ceph_data_pool_name
- container_syscall_intercept_mount
- compression_squashfs
- container_raw_mount
- container_nic_routed
- container_syscall_intercept_mount_fuse
- container_disk_ceph
- virtual-machines
- image_profiles
- clustering_architecture
- resources_disk_id
- storage_lvm_stripes
- vm_boot_priority
- unix_hotplug_devices
- api_filtering
api_status: stable
api_version: "1.0"
auth: trusted
public: false
auth_methods:
- tls
environment:
addresses: []
architectures:
- x86_64
- i686
certificate: |
-----BEGIN CERTIFICATE-----
MIICHzCCAaWgAwIBAgIRALaymvg/sD45IaaDgw/wMwwwCgYIKoZIzj0EAwMwNTEc
MBoGA1UEChMTbGludXhweFwsAZXJzLm9yZzEVMBMGA1UEAwwMcm9vdEBtYXRl
b3BjMB4XDTIwMDIwNDE1NTczMloXDTMwMDIwMTE1NTczMlowNTEcMBoGA1UEChMT
bGludXhjb250YWluZXJzOp9yZzEVMBMGA1UEAwwMcm9vdEBtYXRlb3BjMHYwEAYH
KoZIzj0CAQYFK4EEACIDYgAECcq+kD5wVv/3GXHmi/KnBn0WdCJSOJIV/fWHSRq0
VMKEYs69+7JE2Wkt4c/7DhVe5kCItenDaouKUk+CYz2JebIwmVxUftdSp3W9Bxkp
oD49M2lp5xpjv5wRgHNvqNGgo3kwdzAOBgNVHQ8BAf8EBAMCBaAwEwYDVR0lBAww
CgYIKwYBBQUHAwEwDAYDVR0TAQH/BAIwADBCBgNVHREEOzA5ggdtYXRlb3BjhwTA
qAFkhxAmAIgAMAASE5/6EM2IskXzhxAmAIgAMAA1+tz9Xhh8QvoNhwTAqHoBMAoG
CCqGSM49BAMDA2gAMGUCMQDv48OYKft7JJfMotTH0J5Px3cMV7X3fF7sDN8LAihI9
+131dBD4p7oIA7qawzzBBmG8CMGt0L0LhdNTEYmu+voBA1eXIV7qsUSpfp0JFbCa
H1iPDVYOEWAJlEnIG/AU8zWd2w==
-----END CERTIFICATE-----
certificate_fingerprint: ea90e400cd68763f4707c454bd15b4163ed14e70c6c99dd77d56a4a6205c5bfe
driver: lxc
driver_version: 3.2.1
kernel: Linux
kernel_architecture: x86_64
kernel_features:
netnsid_getifaddrs: "true"
seccomp_listener: "true"
seccomp_listener_continue: "false"
shiftfs: "false"
uevent_injection: "true"
unpriv_fscaps: "true"
kernel_version: 5.4.17-200.fc31.x86_64
lxc_features:
cgroup2: "false"
mount_injection_file: "true"
network_gateway_device_route: "true"
network_ipvlan: "true"
network_l2proxy: "true"
network_phys_macvlan_mtu: "true"
network_veth_router: "true"
seccomp_notify: "true"
project: default
server: lxd
server_clustered: false
server_name: mateopc
server_pid: 4457
server_version: "3.20"
storage: btrfs
storage_version: "5.4"
And here is showing the issue:
matt@mateopc:~$ lxc list
+---------+---------+------+------+-----------+-----------+
| NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS |
+---------+---------+------+------+-----------+-----------+
| aws-cli | STOPPED | | | CONTAINER | 0 |
+---------+---------+------+------+-----------+-----------+
matt@mateopc:~$ lxc launch ubuntu:18.04
Creating the instance
Instance name is: oriented-skunk
Starting oriented-skunk
matt@mateopc:~$ lxc list
+----------------+---------+------+----------------------------------------------+-----------+-----------+
| NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS |
+----------------+---------+------+----------------------------------------------+-----------+-----------+
| aws-cli | STOPPED | | | CONTAINER | 0 |
+----------------+---------+------+----------------------------------------------+-----------+-----------+
| oriented-skunk | RUNNING | | fd42:959f:1c1:9599:216:3eff:fe0b:a4a6 (eth0) | CONTAINER | 0 |
+----------------+---------+------+----------------------------------------------+-----------+-----------+
matt@mateopc:~$ lxc exec oriented-skunk -- /bin/bash
root@oriented-skunk:~# ping google.com
ping: google.com: Temporary failure in name resolution
root@oriented-skunk:~# dhclient
cmp: EOF on /tmp/tmp.z5XlifALLq which is empty
System has not been booted with systemd as init system (PID 1). Can't operate.
root@oriented-skunk:~# apt update
Err:1 http://archive.ubuntu.com/ubuntu bionic InRelease
Temporary failure resolving 'archive.ubuntu.com'
Err:2 http://archive.ubuntu.com/ubuntu bionic-updates InRelease
Temporary failure resolving 'archive.ubuntu.com'
Err:3 http://archive.ubuntu.com/ubuntu bionic-backports InRelease
Temporary failure resolving 'archive.ubuntu.com'
Err:4 http://security.ubuntu.com/ubuntu bionic-security InRelease
Temporary failure resolving 'security.ubuntu.com'
Reading package lists... Done
Building dependency tree
Reading state information... Done
All packages are up to date.
W: Failed to fetch http://archive.ubuntu.com/ubuntu/dists/bionic/InRelease Temporary failure resolving 'archive.ubuntu.com'
W: Failed to fetch http://archive.ubuntu.com/ubuntu/dists/bionic-updates/InRelease Temporary failure resolving 'archive.ubuntu.com'
W: Failed to fetch http://archive.ubuntu.com/ubuntu/dists/bionic-backports/InRelease Temporary failure resolving 'archive.ubuntu.com'
W: Failed to fetch http://security.ubuntu.com/ubuntu/dists/bionic-security/InRelease Temporary failure resolving 'security.ubuntu.com'
W: Some index files failed to download. They have been ignored, or old ones used instead.
Let me know if that helps or if there is anything else I can show.
Thanks!
PS Is there a way to downgrade lxd so that I can try to see if it works on 3.18 still? When I try it says 3.20 is the lowest version in your repo:
matt@mateopc:~$ sudo dnf downgrade lxd
Last metadata expiration check: 2:00:48 ago on Wed 12 Feb 2020 11:51:38 AM MST.
Package lxd of lowest version already installed, cannot downgrade it.
Dependencies resolved.
Nothing to do.
Complete!
I ran into a similar problem but without cgroupv2 (on CentOS 7), so I guess it's not specific to v2. That cmdline arg to init trick does not work here, however; I have to set systemd containers to privileged.
Here I found that generated lxc.conf says
lxc.mount.auto = proc:rw sys:rw
however 3.18 from this copr repo, and 3.20 on archlinux, have
lxc.mount.auto = proc:rw sys:rw cgroup:mixed
3.18 on CentOS 7 is cgroup1 only, 3.20 on archlinux gets both v1 and v2. Not sure why it decided to leave out cgroup.
I just installed LXD from the COPR repository on a fresh Fedora 31.
And I also found some issues:
[vagrant@fedora31 ~]$ lxc launch images:fedora/31 f31
Creating f31
Starting f31
[vagrant@fedora31 ~]$ lxc shell f31
Error: Container is not running
[vagrant@fedora31 ~]$ lxc info f31
Name: f31
Location: none
Remote: unix://
Architecture: x86_64
Created: 2020/02/14 01:12 UTC
Status: Stopped
Type: container
Profiles: default
The container fails to start without an error message :open_mouth: . Then there is a hint in the console log:
[vagrant@fedora31 ~]$ sudo cat /var/log/lxd/f31/console.log
Failed to mount cgroup at /sys/fs/cgroup/systemd: Operation not permitted
[!!!!!!] Failed to mount API filesystems.
Exiting PID 1...
In contrast to my Fedora 30 systems where I'm currently using LXD, I find different messages in the LXCFS log:
Feb 14 01:11:05 fedora31 systemd[1]: Started FUSE filesystem for LXC.
Feb 14 01:11:05 fedora31 lxcfs[2561]: mount namespace: 5
Feb 14 01:11:05 fedora31 lxcfs[2561]: hierarchies:
Feb 14 01:11:05 fedora31 lxcfs[2561]: 0: fd: 6: unified
Feb 11 19:16:07 rhea.oasis.home systemd[1]: Started FUSE filesystem for LXC.
Feb 11 19:16:07 rhea.oasis.home lxcfs[773]: mount namespace: 5
Feb 11 19:16:07 rhea.oasis.home lxcfs[773]: hierarchies:
Feb 11 19:16:07 rhea.oasis.home lxcfs[773]: 0: fd: 6: blkio
Feb 11 19:16:07 rhea.oasis.home lxcfs[773]: 1: fd: 7: devices
Feb 11 19:16:07 rhea.oasis.home lxcfs[773]: 2: fd: 8: freezer
Feb 11 19:16:07 rhea.oasis.home lxcfs[773]: 3: fd: 9: net_cls,net_prio
Feb 11 19:16:07 rhea.oasis.home lxcfs[773]: 4: fd: 10: memory
Feb 11 19:16:07 rhea.oasis.home lxcfs[773]: 5: fd: 11: cpu,cpuacct
Feb 11 19:16:07 rhea.oasis.home lxcfs[773]: 6: fd: 12: hugetlb
Feb 11 19:16:07 rhea.oasis.home lxcfs[773]: 7: fd: 13: perf_event
Feb 11 19:16:07 rhea.oasis.home lxcfs[773]: 8: fd: 14: cpuset
Feb 11 19:16:07 rhea.oasis.home lxcfs[773]: 9: fd: 15: pids
Feb 11 19:16:07 rhea.oasis.home lxcfs[773]: 10: fd: 16: name=systemd
Feb 11 19:16:07 rhea.oasis.home lxcfs[773]: 11: fd: 17: unified
That doesn't look like an issue with the packaging. But with LXCFS/LXD handling cgroup2-only systems.
Work-around
As work-around I added systemd.unified_cgroup_hierarchy=0
to the grub command and after a reboot, the container could start and got an IP:
[vagrant@fedora31 ~]$ cat /proc/cmdline
BOOT_IMAGE=(hd0,msdos1)/boot/vmlinuz-5.4.18-200.fc31.x86_64 root=UUID=95af7b45-6542-4816-9aed-2b70feb90faf ro no_timer_check console=tty1 console=ttyS0,115200n8 net.ifnames=0 biosdevname=0 systemd.unified_cgroup_hierarchy=0
[vagrant@fedora31 ~]$ lxc start f31
[vagrant@fedora31 ~]$ lxc list
+------+---------+---------------------+------+-----------+-----------+
| NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS |
+------+---------+---------------------+------+-----------+-----------+
| f31 | RUNNING | 10.192.200.2 (eth0) | | CONTAINER | 0 |
+------+---------+---------------------+------+-----------+-----------+
After adding the work-around I also didn't have any issue starting an Ubuntu container:
[vagrant@fedora31 ~]$ lxc launch ubuntu:18.04 bionic
Creating bionic
Starting bionic
[vagrant@fedora31 ~]$ lxc list
+--------+---------+-----------------------+-----------------------------------------------+-----------+-----------+
| NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS |
+--------+---------+-----------------------+-----------------------------------------------+-----------+-----------+
| bionic | RUNNING | 10.192.200.180 (eth0) | fd42:bd12:4c04:9d99:216:3eff:fe75:70e1 (eth0) | CONTAINER | 0 |
+--------+---------+-----------------------+-----------------------------------------------+-----------+-----------+
| f31 | RUNNING | 10.192.200.2 (eth0) | fd42:bd12:4c04:9d99:216:3eff:fe85:975c (eth0) | CONTAINER | 0 |
+--------+---------+-----------------------+-----------------------------------------------+-----------+-----------+
Is there a way to downgrade lxd so that I can try to see if it works on 3.18 still? When I try it says 3.20 is the lowest version in your repo
Ya, that's a bit unfortunate. COPR will only keep the latest successfully built package version. Unfortunately since I don't use LXD on Fedora 31 yet, I haven't cached the "official" packages anywere. But I still have the RPMs that I built locally when I was testing the spec file. You can download them from here: https://linuxmonk.ch/packages/lxc3/fedora/31/x86_64/ These RPMs are identical to the COPR one's just that they weren't built on the Fedora infrastructure.
After switching back to cgroup2-only and the default container configuration, I can reproduce your issue. The Ubuntu container would start, but not get an IPv4 address:
[vagrant@fedora31 ~]$ lxc list
+--------+---------+------+-----------------------------------------------+-----------+-----------+
| NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS |
+--------+---------+------+-----------------------------------------------+-----------+-----------+
| bionic | RUNNING | | fd42:bd12:4c04:9d99:216:3eff:fe75:70e1 (eth0) | CONTAINER | 0 |
+--------+---------+------+-----------------------------------------------+-----------+-----------+
| f31 | STOPPED | | | CONTAINER | 0 |
+--------+---------+------+-----------------------------------------------+-----------+-----------+
I can also confirm, that setting the lxc.init.cmd
doesn't help. The Fedora 31 container still won't start (same error) with the following settings:
[vagrant@fedora31 ~]$ lxc config show f31
architecture: x86_64
config:
image.architecture: amd64
image.description: Fedora 31 amd64 (20200213_20:33)
image.os: Fedora
image.release: "31"
image.serial: "20200213_20:33"
image.type: squashfs
raw.lxc: lxc.init.cmd = /sbin/init systemd.unified_cgroup_hierarchy
volatile.base_image: 17cc572a411a9650ae8ddc6b4e37c01c3215198e66ae5b40a56a80e9d3bb36c3
volatile.eth0.hwaddr: 00:16:3e:85:97:5c
volatile.idmap.base: "0"
volatile.idmap.current: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":65536},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":65536}]'
volatile.idmap.next: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":65536},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":65536}]'
volatile.last_state.idmap: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":65536},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":65536}]'
volatile.last_state.power: STOPPED
devices: {}
ephemeral: false
profiles:
- default
stateful: false
description: ""
I'll try to find some time to look into this. It would help me if you could post the cgroup layout of the container by looking at:
cat /proc/<container-init>/cgroup
cat /proc/<container-monitor>/cgroup
I guess you refer to the Ubuntu container, because the Fedora container cannot even be started:
[vagrant@fedora31 ~]$ cat /proc/1034/cgroup
0::/lxc.monitor/bionic
[vagrant@fedora31 ~]$ cat /proc/1042/cgroup
0::/lxc.payload/bionic
Oh that's an old liblxc version it seems. What's the liblxc version you're using?
Name : lxc-libs
Version : 3.2.1
Release : 0.3.fc31
Architecture: x86_64
Install Date: Fri 14 Feb 2020 01:10:19 AM UTC
Group : Unspecified
Size : 1421776
License : LGPLv2+ and GPLv2
Signature : RSA/SHA1, Sat 28 Sep 2019 06:01:33 PM UTC, Key ID 97b8ff00f70e1f77
Source RPM : lxc-3.2.1-0.3.fc31.src.rpm
Build Date : Sat 28 Sep 2019 06:00:01 PM UTC
Build Host : copr-builder-734651165.novalocal
URL : https://linuxcontainers.org/lxc
Summary : Runtime library files for lxc
Description :
Linux Resource Containers provide process and resource isolation without the
overhead of full virtualization.
The lxc-libs package contains libraries for running lxc applications
Right, you're missing a bunch of patches that are required and will be available in the next release.
Ok, great. Any plan when this is due?
//Cc @stgraber I think the plan was to release 4.0 around end of March-ish?
LXD 4.0 should be mid to end of March, LXC/LXCFS should be earlier than that.
Closing out as this is OBE now since the versions are quite a bit older. Still having similar issues on 4.0.1 (snap package) on F32 but I will look in to that and open tickets in other areas. Thanks for all the input!
For anyone in the future who finds this from searching, my issues with LXD 4 (via snapd) on Fedora 32 look to be caused by firewalld. Turning off firewalld the containers work fine so I just need to go in and see what's not being set let through.
Running Fedora 31 and have LXD installed via the steps on https://copr.fedorainfracloud.org/coprs/ganto/lxc3/ . Everything was working really well although for systemd containers I had to set
lxc config set <container-name> raw.lxc 'lxc.init.cmd = /sbin/init systemd.unified_cgroup_hierarchy'
for them to be able to work properly (found that command from https://discuss.linuxcontainers.org/t/cgroups-v2-adoption/6074/9 )-- biggest tell that they were not working was that they would not get a ipv4 IP an can only be stopped via--force
. That wasn't a big deal since it only needed to be set once per container. I upgraded to LXD 3.20 today and now I cannot get any containers to work even after setting that.Wanted to post here before asking in the general LXD forum since it may be specific to this distribution for Fedora/CentOS with cgroups2 enabled. I do not want to disable cgroupv2 but can provide any logging or debug output that could be useful just let me know what to run to get the output as I am only a general lxd user.
Thanks!