NixOS / nixpkgs

Nix Packages collection & NixOS
MIT License
17.3k stars 13.54k forks source link

Failing to create a docker image with runAsRoot #325308

Open ahuston-0 opened 2 months ago

ahuston-0 commented 2 months ago

Describe the bug

I'm trying to translate the nextcloud docker apache image link to source to a dockerTools.buildImage setup. I'm trying to use the runAsRoot functionality to do some of the apt-get steps and such (as this is an Ubuntu-based image), but I'm not even able to get to the point where the script runs. I believe it's failing at this line of the VM setup script.

Steps To Reproduce

Steps to reproduce the behavior:

  1. clone this repo
  2. sudo nix build .#nixosConfigurations.palatine-hill.config.system.build.toplevel --verbose
  3. build fails while trying to set up VM

Expected behavior

  1. clone the repo
  2. sudo nix build .#nixosConfigurations.palatine-hill.config.system.build.toplevel --verbose
  3. VM is setup correctly
  4. runAsRoot script executes

Additional context

Link to file creating the image: https://github.com/RAD-Development/nix-dotfiles/blob/feature/docker-palatine-hill-migration/systems/palatine-hill/docker/nextcloud-image/default.nix

Builder Logs:

Formatting './image/disk-image.qcow2', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=8589934592 lazy_refcounts=off refcount_bits=16
SeaBIOS (version rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org)

iPXE (http://ipxe.org) 00:03.0 CA00 PCI2.10 PnP PMM+BEFD0CF0+BEF30CF0 CA00
Press Ctrl-B to configure iPXE (PCI 00:03.0)...^M

Booting from ROM...
Probing EDD (edd=off to disable)... ok
loading kernel modules...
mounting Nix store...
mounting host's temporary directory...
starting stage 2 (/nix/store/jzmr6hn0axllfrvcivhw11ara949y3x0-vm-run-stage2)
mke2fs 1.47.1 (20-May-2024)
Discarding device blocks: done
Creating filesystem with 2097152 4k blocks and 524288 inodes
Filesystem UUID: ec9352d3-1aa4-46aa-b07a-bee7c2d5c44d
Superblock backups stored on blocks:
        32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632

Allocating group tables: done
Writing inode tables: done
Writing superblocks and filesystem accounting information: done

Unpacking base image...
From-image name or tag wasn't set. Reading the first ID.
Unpacking layer 1387079e86adf524e7e92bada71d261d9ff58f34409751ab36560385262a8386.tar
Unpacking layer 952cd80a113e6d9fde88d9504ae3fc5cb72f89623cd27ee73291f09759063a7b.tar
Unpacking layer ca4d579d9813659f0881b61389d1f8b40c35b54bf186f36047ee470be3623af7.tar
Unpacking layer 32192bb8c2b3973f7d96677e9e391c28dca2a400f9deaff16d9cc15d790c781a.tar
Unpacking layer 0453a00460037a290505ea65b27e0791905b0d5c3cde4940c26467af6f2e9276.tar
Unpacking layer f0f8146131f33f874d46f43ddcc717afb266a5b1440e4fc0d8113c5aacb2b2b7.tar
Unpacking layer e9e42790beb62eeb50cef3e2fd4453beb72fb2b32504d74942c5a15015da85e6.tar
Unpacking layer dea0fcb449af8f133b5193e11a7bfddef14595887bd63234594e4fb5b9e884be.tar
Unpacking layer d676aee28dcb213e1d651c553741d6feb94419efec1430c008ebf321536ed9a1.tar
Unpacking layer b18a38eb39ee7074ea4508f5c126c33d5b099a7af32515594a247ce68b9c6be4.tar
Unpacking layer ff618a75e80ae57b3ec6b23489dc19026926833969e5a05b6caa70ef2be1cb34.tar
Unpacking layer c292898447309fe7a856a07044ea3f16e5c04ba8e960c787911ec6fd88a5b810.tar
Unpacking layer 89d6fca79759c675c27e3776f7d1fffb2eaac2a93a6d704f66932fcede9e836e.tar
Unpacking layer b8121992215f5b11bcb2084a9be96aeea6f19cf02989e7e88f018a2578299179.tar
Unpacking layer aa4ffc6958c6ae98c647997826f0ea20bec413f6a0cfb3c598dfa53df1b7aa92.tar
Unpacking layer ed21c4f0af1b655418d7d46e24cf052ead2c2b144da444b561afd20e754d3bf7.tar
Unpacking layer ab0ab8f834d61328b839df02e8879e9423943720ad515242d78b699559f5c1cc.tar
Unpacking layer c890292b759a8b331647b65cfcf449983c5fa836f7ea85c8e9279650ec1f7412.tar
Unpacking layer af14aed2da47bf80ba204f37bd2b6cfdcf18a9758b2599495556071819aa68c3.tar
Unpacking layer ab2e23c714e3cc3a73ac5fb52a7d9733bc89064ed19d14a42c15adf822fcd744.tar
Unpacking layer 327d6aeb4c318e02882420f92a795fa51b549a91a34e3cf1b84371e49b6559a6.tar
mount: /tmp/disk/mnt: wrong fs type, bad option, bad superblock on overlay, missing codepage or helper program, or other error.
       dmesg(1) may have more information after failed mount system call.
[   34.502092] reboot: Power down

Notify maintainers

Metadata

Please run nix-shell -p nix-info --run "nix-info -m" and paste the result.

Note on the nix version, I've tried this on 2.23 and 2.18 and got the same result.

nix-shell -p nix-info --run "nix-info -m"
 - system: `"x86_64-linux"`
 - host os: `Linux 6.9.7-zen1, NixOS, 24.11 (Vicuna), 24.11.20240701.00d80d1`
 - multi-user?: `yes`
 - sandbox: `yes`
 - version: `nix-env (Nix) 2.23.1`
 - nixpkgs: `/nix/store/j4jzjbr302cw5bl0n3pch5j9bh5qwmaj-source`

Add a :+1: reaction to issues you find important.

the-sun-will-rise-tomorrow commented 1 month ago

clone this repo

Could you please prepare a minimal (or, at least, more reduced) test case, this is a bit too much to work with.

ahuston-0 commented 1 month ago

@the-sun-will-rise-tomorrow

sorry for the complexity, that repo has quite a few machines and bells and whistles. Tried to reduce it some, still seeing the same behavior when i run the below with this new repo.

Steps to reproduce the behavior:

  1. clone this repo
  2. sudo nix build .#nixosConfigurations.test-machine.config.system.build.toplevel --verbose
  3. build fails while trying to set up VM

https://github.com/ahuston-0/docker-build-mvp

the-sun-will-rise-tomorrow commented 1 month ago

Thanks!

I think this is the same issue as: https://github.com/docker/for-linux/issues/1443

It looks like it's a bug in the Linux kernel which we can at best try to work around, and none of the work arounds fix the problem completely :/

the-sun-will-rise-tomorrow commented 1 month ago

Here is a kernel log excerpt from when it happens (i.e. mount fails):

[    7.151007] tar: page allocation failure: order:5, mode:0x40dc0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO), nodemask=(null),cpuset=/,mems_allowed=0
[    7.151014] CPU: 12 PID: 2144 Comm: tar Not tainted 6.6.40 #1-NixOS
[    7.151015] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[    7.151016] Call Trace:
[    7.151018]  <TASK>
[    7.151019]  dump_stack_lvl+0x47/0x70
[    7.151023]  warn_alloc+0x178/0x1f0
[    7.151026]  ? __alloc_pages_direct_compact+0x1a1/0x2b0
[    7.151028]  __alloc_pages_slowpath.constprop.0+0xd91/0xdf0
[    7.151029]  ? _raw_spin_unlock+0xe/0x30
[    7.151031]  __alloc_pages+0x33c/0x360
[    7.151034]  ? v9fs_alloc_rdir_buf.isra.0+0x2c/0x40 [9p]
[    7.151038]  __kmalloc_large_node+0x73/0x140
[    7.151039]  __kmalloc+0xd4/0x160
[    7.151040]  v9fs_alloc_rdir_buf.isra.0+0x2c/0x40 [9p]
[    7.151044]  v9fs_dir_readdir_dotl+0x68/0x1d0 [9p]
[    7.151048]  ? selinux_file_permission+0x119/0x160
[    7.151050]  iterate_dir+0xa1/0x190
[    7.151052]  __x64_sys_getdents64+0x88/0x140
[    7.151053]  ? __pfx_filldir64+0x10/0x10
[    7.151054]  do_syscall_64+0x39/0x90
[    7.151056]  entry_SYSCALL_64_after_hwframe+0x78/0xe2
[    7.151058] RIP: 0033:0x7f898daf21a7
[    7.151062] Code: 89 e8 5b 5d 31 d2 31 ff c3 66 2e 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 ff ff ff 7f 48 39 c2 48 0f 47 d0 b8 d9 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 11 31 d2 31 c9 31 f6 31 ff 45 31 db c3 0f 1f
[    7.151063] RSP: 002b:00007fff48cb4718 EFLAGS: 00000293 ORIG_RAX: 00000000000000d9
[    7.151064] RAX: ffffffffffffffda RBX: 00007f898d9ec010 RCX: 00007f898daf21a7
[    7.151065] RDX: 0000000000020000 RSI: 00007f898d9ec040 RDI: 0000000000000005
[    7.151065] RBP: 00007f898d9ec014 R08: 0000000000000000 R09: 0000000000000000
[    7.151066] R10: 0000000000000000 R11: 0000000000000293 R12: 00007f898d9ec040
[    7.151066] R13: ffffffffffffff88 R14: 0000000000000002 R15: 000000000165d170
[    7.151067]  </TASK>
[    7.151068] Mem-Info:
[    7.151069] active_anon:484 inactive_anon:0 isolated_anon:0
                active_file:5270 inactive_file:81554 isolated_file:0
                unevictable:0 dirty:4370 writeback:0
                slab_reclaimable:9459 slab_unreclaimable:12093
                mapped:786 shmem:6 pagetables:22
                sec_pagetables:0 bounce:0
                kernel_misc_reclaimable:0
                free:3705 free_pcp:43 free_cma:0
[    7.151071] Node 0 active_anon:1936kB inactive_anon:0kB active_file:21080kB inactive_file:326092kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:3144kB dirty:17480kB writeback:0kB shmem:24kB shmem_thp:0kB shmem_pmdmapped:0kB anon_thp:0kB writeback_tmp:0kB kernel_stack:3856kB pagetables:88kB sec_pagetables:0kB all_unreclaimable? no
[    7.151073] Node 0 DMA free:1804kB boost:0kB min:88kB low:108kB high:128kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:13000kB unevictable:0kB writepending:0kB present:15992kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[    7.151075] lowmem_reserve[]: 0 430 430 430 430
[    7.151077] Node 0 DMA32 free:13016kB boost:0kB min:2608kB low:3260kB high:3912kB reserved_highatomic:2048KB active_anon:1908kB inactive_anon:0kB active_file:21180kB inactive_file:313024kB unevictable:0kB writepending:17492kB present:507764kB managed:451696kB mlocked:0kB bounce:0kB free_pcp:188kB local_pcp:0kB free_cma:0kB
[    7.151079] lowmem_reserve[]: 0 0 0 0 0
[    7.151081] Node 0 DMA: 1*4kB (U) 7*8kB (UE) 1*16kB (E) 4*32kB (UME) 3*64kB (UME) 1*128kB (M) 1*256kB (U) 0*512kB 1*1024kB (M) 0*2048kB 0*4096kB = 1804kB
[    7.151086] Node 0 DMA32: 114*4kB (UMEH) 320*8kB (UMEH) 321*16kB (UMEH) 38*32kB (UMEH) 37*64kB (MH) 2*128kB (H) 2*256kB (H) 1*512kB (H) 0*1024kB 0*2048kB 0*4096kB = 13016kB
[    7.151091] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[    7.151092] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[    7.151093] 86768 total pagecache pages
[    7.151093] 0 pages in swap cache
[    7.151094] Free swap  = 0kB
[    7.151094] Total swap = 0kB
[    7.151094] 130939 pages RAM
[    7.151095] 0 pages HighMem/MovableOnly
[    7.151095] 14175 pages reserved
[    7.151095] 0 pages cma reserved
[    7.151095] 0 pages hwpoisoned
the-sun-will-rise-tomorrow commented 1 month ago

One way to completely avoid this problem is to use another implementation of OverlayFS.

I opened https://github.com/NixOS/nixpkgs/pull/329696 which adds a new option, useFUSEOverlayFS, that switches the implementation of OverlayFS over to the older FUSE version, which does not have this problem.

ahuston-0 commented 1 month ago

Alright awesome. I see some workarounds there, I'll see if any of them work and then switch to the new option once this is merged

the-sun-will-rise-tomorrow commented 1 month ago

I see some workarounds there

Note that those workarounds need to be applied to the VM that dockerTools creates and runs, not the host machine running nix-build.

and then switch to the new option once this is merged

It would help if you could test that the patch fixes the problem you were seeing in the first place, and make a note on the pull request to that effect.

ahuston-0 commented 1 month ago

I see some workarounds there

Note that those workarounds need to be applied to the VM that dockerTools creates and runs, not the host machine running nix-build.

and then switch to the new option once this is merged

It would help if you could test that the patch fixes the problem you were seeing in the first place, and make a note on the pull request to that effect.

Oh yeah no problem. Let me try it after work on both machines.

ahuston-0 commented 1 month ago

still failing to build but its getting inside the VM. Does the VM not have internet connectivity?

That said, I'm definitely getting inside the VM so that's awesome

error: builder for '/nix/store/2p7waij0cgdp8s200cnmcdy4mzidk872-docker-layer-nextcloud-custom.drv' failed with exit code 100;
       last 25 log lines:
       > Ign:3 http://deb.debian.org/debian-security bookworm-security InRelease
       > Err:1 http://deb.debian.org/debian bookworm InRelease
       >   Temporary failure resolving 'deb.debian.org'
       > Err:2 http://deb.debian.org/debian bookworm-updates InRelease
       >   Temporary failure resolving 'deb.debian.org'
       > Err:3 http://deb.debian.org/debian-security bookworm-security InRelease
       >   Temporary failure resolving 'deb.debian.org'
       > Reading package lists... Done
       > W: Failed to fetch http://deb.debian.org/debian/dists/bookworm/InRelease  Temporary failure resolving 'deb.debian.org'
       > W: Failed to fetch http://deb.debian.org/debian/dists/bookworm-updates/InRelease  Temporary failure resolving 'deb.debian.org'
       > W: Failed to fetch http://deb.debian.org/debian-security/dists/bookworm-security/InRelease  Temporary failure resolving 'deb.debian.org'
       > W: Some index files failed to download. They have been ignored, or old ones used instead.
       > + /usr/bin/apt-get install -y --no-install-recommends ffmpeg ghostscript libmagickcore-6.q16-6-extra procps smbclient supervisor
       > Reading package lists... Done
       > Building dependency tree... Done
       > Reading state information... Done
       > Package ghostscript is not available, but is referred to by another package.
       > This may mean that the package is missing, has been obsoleted, or
       > is only available from another source
       >
       > E: Unable to locate package ffmpeg
       > E: Package 'ghostscript' has no installation candidate
       > E: Unable to locate package smbclient
       > E: Unable to locate package supervisor
       > [   23.261455] reboot: Power down
       For full logs, run 'nix log /nix/store/2p7waij0cgdp8s200cnmcdy4mzidk872-docker-layer-nextcloud-custom.drv'.
the-sun-will-rise-tomorrow commented 1 month ago

Does the VM not have internet connectivity?

No, input-addressed derivations don't have network connectivity, as that would allow them to be impure. Content-addressed derivations can access the Internet, but they are expected to produce the exact same output every time, which might not be feasible in the case of a VM image that installs packages with apt-get. Anyway, I think that's a bit beyond the scope of the original problem, you may get more/better answers on e.g. https://discourse.nixos.org/.

ahuston-0 commented 1 month ago

Yeah that's not a problem. I have the sandbox to work off of and can figure it out from there hopefully. Thank you for all the help