coreos / coreos-assembler

Tooling container to assemble CoreOS-like systems
https://coreos.github.io/coreos-assembler/
Apache License 2.0
339 stars 166 forks source link

Howto build a custom OS from scratch #1680

Open xcross-ibm opened 4 years ago

xcross-ibm commented 4 years ago

I'm modernizing an embedded Linux distribution built from RHEL rpms and looking to replace our bespoke build system with coreos-assembler. I've read the READMEs and done the tutorial to build a FCOS image and run it. It just worked, nicely :) Now I've started the serious digging and I thought I'd ask for pointers before reading the entire source. (I'm starting with the coreos-assembler script and the fedora-coreos-base.yaml manifest.)

  1. I'm not building a fedora derivative, rather a very locked down system comprised of systemd and a small set of applications, their libs, and the rpms that support them (and bash for debug only.) Is there a minimal manifest example?

  2. I need to do some very custom stuff with image production and initial ram disk. I'm starting to read at cmd-build script. It may be a long slog from there ;-)

Thanks in advance for your help. Chris cc: @ashcrow

ashcrow commented 4 years ago

Hi Chris,

This is the right place to ask! :smile:

  1. I'm not building a fedora derivative, rather a very locked down system comprised of systemd and a small set of applications, their libs, and the rpms that support them (and bash for debug only.) Is there a minimal manifest example?

I don't think we have one created as an example. From a quick local test the following seems to be the minimal required keys/sections:

repos:
    - ...

automatic-version-prefix: "${releasever}.<date:%Y%m%d>"
mutate-os-release: "${releasever}"

# Explicit list of needed packages
packages:
    - ...

https://github.com/coreos/rpm-ostree/blob/master/docs/manual/treefile.md defines the sections pretty well.

In terms of minimal manifest listing the smallest set of packages for a booting system --- that's a good question :sweat_smile:.

  1. I need to do some very custom stuff with image production and initial ram disk. I'm starting to read at cmd-build script. It may be a long slog from there ;-)

If you can provide some ideas of what you have to do with producing images we may be able to provide some guidance. The workflow we use with coreos-assembler for boot images is creating the qcow2 first and then transforming it to other formats with cmd-buildextend-$NAME commands.

ashcrow commented 4 years ago

cc @travier @jlebon @bgilbert to see if they have any initial pointers as well

cgwalters commented 4 years ago

Please see https://github.com/coreos/coreos-assembler/blob/master/README-custom.md

xcross-ibm commented 4 years ago

I'm trying to build the container and get the following error. I'm working with a clean tree and get the error on both master and fcos-32.20200726.3.0.

chris:coreos-assembler$ podman build -t xcross/coreos-test .
...

STEP 8: RUN ./build.sh write_archive_info
++ pwd
+ srcdir=/root/containerbuild
+ '[' 1 -ne 0 ']'
+ write_archive_info
+ . /root/containerbuild/src/cmdlib.sh
++ set -euo pipefail
+++ dirname ./build.sh
++ DIR=.
++ RFC3339=%Y-%m-%dT%H:%M:%SZ
++ grep -q '^Fedora' /etc/redhat-release
++ export ISFEDORA=1
++ ISFEDORA=1
++ export ISEL=
++ ISEL=
+++ python3 -c '
import gi
gi.require_version("RpmOstree", "1.0")
from gi.repository import RpmOstree
print(RpmOstree.get_basearch())'
++ basearch=x86_64
++ export basearch
+++ uname -m
++ arch=x86_64
++ export arch
++ case $arch in
++ DEFAULT_TERMINAL=ttyS0
++ export DEFAULT_TERMINAL
++ _privileged=
+ mkdir -p /cosa /lib/coreos-assembler
+ touch -f /lib/coreos-assembler/.clean
+ prepare_git_artifacts /root/containerbuild /cosa/coreos-assembler-git.tar.gz /cosa/coreos-assembler-git.json
+ local gitd=/root/containerbuild
+ shift
+ local tarball=/cosa/coreos-assembler-git.tar.gz
+ shift
+ local json=/cosa/coreos-assembler-git.json
+ shift
+ openshift_git_hack /root/containerbuild
+ local gitd=/root/containerbuild
+ shift
+ '[' x == x ']'
+ return
+ local is_dirty=false
+ local head_ref=unknown
+ local head_remote=unknown
+ local head_url=unknown
+ local 'gc=git --work-tree=/root/containerbuild --git-dir=/root/containerbuild/.git'
+ git --work-tree=/root/containerbuild --git-dir=/root/containerbuild/.git diff --quiet --exit-code
+ tar -C /root/containerbuild -czf /cosa/coreos-assembler-git.tar.gz --exclude-vcs .
tar: .: file changed as we read it
Error: error building at STEP "RUN ./build.sh write_archive_info": error while running runtime: exit status 1

cc: @ashcrow

travier commented 4 years ago

To help you here we would probably need:

xcross-ibm commented 4 years ago

@travier:


$ podman version
Version:            1.9.3
RemoteAPI Version:  1
Go Version:         go1.13.4
OS/Arch:            linux/amd64

$ podman info --debug
debug:
  compiler: gc
  gitCommit: ""
  goVersion: go1.13.4
  podmanVersion: 1.9.3
host:
  arch: amd64
  buildahVersion: 1.14.9
  cgroupVersion: v1
  conmon:
    package: conmon-2.0.17-1.module+el8.2.1+6771+3533eb4c.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.0.17, commit: 3c703d9f178a3a53966e1d5c03d0275ea6cb36a0'
  cpus: 12
  distribution:
    distribution: '"rhel"'
    version: "8.2"
  eventLogger: file
  hostname: imbroglio
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
  kernel: 4.18.0-193.6.3.el8_2.x86_64
  memFree: 76993478656
  memTotal: 134854512640
  ociRuntime:
    name: runc
    package: runc-1.0.0-66.rc10.module+el8.2.1+6465+1a51e8b6.x86_64
    path: /usr/bin/runc
    version: 'runc version spec: 1.0.1-dev'
  os: linux
  rootless: true
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.0.1-1.module+el8.2.1+6595+03641d72.x86_64
    version: |-
      slirp4netns version 1.0.1
      commit: 6a7b16babc95b6a3056b33fb45b74a6f62262dd4
      libslirp: 4.3.0
  swapFree: 134855258112
  swapTotal: 134855258112
  uptime: 985h 17m 39.14s (Approximately 41.04 days)
registries:
  search:
  - registry.access.redhat.com
  - registry.redhat.io
  - docker.io
store:
  configFile: /home/chris/.config/containers/storage.conf
  containerStore:
    number: 1
    paused: 0
    running: 1
    stopped: 0
  graphDriverName: overlay
  graphOptions:
    overlay.mount_program:
      Executable: /usr/bin/fuse-overlayfs
      Package: fuse-overlayfs-1.0.0-2.module+el8.2.1+6465+1a51e8b6.x86_64
      Version: |-
        fuse-overlayfs: version 1.0.0
        FUSE library version 3.2.1
        using FUSE kernel interface version 7.26
  graphRoot: /home/chris/.local/share/containers/storage
  graphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "false"
  imageStore:
    number: 26
  runRoot: /run/user/1000
  volumePath: /home/chris/.local/share/containers/storage/volumes

$ rpm -q podman
podman-1.9.3-2.module+el8.2.1+6867+366c07d6.x86_64

$ cat /etc/redhat-release 
Red Hat Enterprise Linux release 8.2 (Ootpa)
ashcrow commented 4 years ago

Thank you @xcross-ibm -- I'm seeing if I can reproduce the container build failure.

ashcrow commented 4 years ago

Running as is off the main branch I was unable to reproduce it though I do have a newer podman: podman-2.0.5-1.fc32.x86_64.

Where there any local changes in the the build scripts or was there possibly something running in parallel on the directory being compressed?

$ podman build . -t cosa:test
[...]
STEP 8: RUN ./build.sh write_archive_info
++ pwd                             
+ srcdir=/root/containerbuild   
+ '[' 1 -ne 0 ']' 
+ write_archive_info
+ . /root/containerbuild/src/cmdlib.sh                                                                    
++ set -euo pipefail
+++ dirname ./build.sh                                                                 
++ DIR=.                           
++ RFC3339=%Y-%m-%dT%H:%M:%SZ   
++ grep -q '^Fedora' /etc/redhat-release                                                                  
++ export ISFEDORA=1
++ ISFEDORA=1                                                            
++ export ISEL=                                                          
++ ISEL=                                   
+++ python3 -c '                           
import gi                                  
gi.require_version("RpmOstree", "1.0")                                                                    
from gi.repository import RpmOstree        
print(RpmOstree.get_basearch())'                                                       
++ basearch=x86_64                                                                     
++ export basearch                                   
+++ uname -m                                         
++ arch=x86_64                                       
++ export arch                                       
++ case $arch in                                     
++ DEFAULT_TERMINAL=ttyS0                            
++ export DEFAULT_TERMINAL                           
++ _privileged=                                      
+ mkdir -p /cosa /lib/coreos-assembler                                                                    
+ touch -f /lib/coreos-assembler/.clean                                                                   
+ prepare_git_artifacts /root/containerbuild /cosa/coreos-assembler-git.tar.gz /cosa/coreos-assembler-git.json                                                                                                       
+ local gitd=/root/containerbuild                    
+ shift                                              
+ local tarball=/cosa/coreos-assembler-git.tar.gz                                                         
+ shift                                              
+ local json=/cosa/coreos-assembler-git.json                                                              
+ shift                                              
+ openshift_git_hack /root/containerbuild                                                                 
+ local gitd=/root/containerbuild                    
+ shift                                              
+ '[' x == x ']'                                     
+ return                                             
+ local is_dirty=false                               
+ local head_ref=unknown                             
+ local head_remote=unknown
+ local head_url=unknown                                                                                                                                                                                             
+ local 'gc=git --work-tree=/root/containerbuild --git-dir=/root/containerbuild/.git'                                                                                                                                
+ git --work-tree=/root/containerbuild --git-dir=/root/containerbuild/.git diff --quiet --exit-code                                                                                                                  
+ tar -C /root/containerbuild -czf /cosa/coreos-assembler-git.tar.gz --exclude-vcs .                                                                                                                                 
+ chmod 0444 /cosa/coreos-assembler-git.tar.gz                                                            
+ local rev                                          
+ local branch                                                                                            
++ git --work-tree=/root/containerbuild --git-dir=/root/containerbuild/.git rev-parse HEAD                                                                                                                           
+ rev=11dbbf31e10ddba2ffa8415260146392e84806a6
xcross-ibm commented 4 years ago

@ashcrow no changes in the source:

chris:coreos-assembler$ git status
On branch master
Your branch is up to date with 'origin/master'.

nothing to commit, working tree clean

Nevertheless, I killed the directory, cloned the repo anew, and built again while reading email. Same result:

+ git --work-tree=/root/containerbuild --git-dir=/root/containerbuild/.git diff --quiet --exit-code
+ tar -C /root/containerbuild -czf /cosa/coreos-assembler-git.tar.gz --exclude-vcs .
tar: .: file changed as we read it
Error: error building at STEP "RUN ./build.sh write_archive_info": error while running runtime: exit status 1
travier commented 4 years ago

Is there anything specific that requires that you rebuild the cosa container? If not, maybe you can start with the builds from quay.io until this is resolved. Otherwise I would suggest trying to build the container inside a Fedora 32 VM.

xcross-ibm commented 4 years ago

@travier My goal is to build a system comprised of kernel+systemd+bash as a jumping of point for my custom OS. To that end I'm starting by inheriting from bootable-rpm-ostree.yaml or, failing that, fedora-coreos-base.yaml. Neither one builds for me so I am reading source to figure out what rpm-ostree wants. I'd like to put in some printfs which means I need to build the cosa container.

Being a perfectly naive user I started by following your suggestion using Fedora Workstation. It failed in STEP 6: RUN ./build.sh install_rpms after getting lots of "dracut" errors starting with:

dracut: No '/dev/log' or 'logger' included for syslog logging
cp: setting attributes for '/var/tmp/dracut.4nat1F/initramfs/bin/bash': Operation not supported
dracut-install: ERROR: installing '/bin/sh'
dracut: FAILED: /usr/lib/dracut/dracut-install -D /var/tmp/dracut.4nat1F/initramfs /bin/sh
findmnt: can't read (null): No such file or directory
...

Then I thought maybe you meant Fedora CoreOS and, voila, that worked to build the cosa container. Yay! I'll get on with reading the source to figure out what rpm-ostree needs to build bootable-rpm-ostree.yaml . For the record, cmd-build issues

sudo -E rpm-ostree compose tree --repo=/srv/tmp/repo --cachedir=/srv/cache --touch-if-changed /srv/tmp/build/tmp/treecompose.changed --unified-core /srv/tmp/build/tmp/override/coreos-assembler-override-manifest.yaml --cache-only --add-metadata-from-json /srv/tmp/build/tmp/commit-metadata-input.json --write-composejson-to /srv/tmp/build/tmp/compose.json --ex-write-lockfile-to /srv/tmp/repo/tmp/manifest-lock.generated.x86_64.json.tmp --ex-lockfile=/srv/src/config/manifest-lock.x86_64.json --ex-lockfile=/srv/src/config/manifest-lock.overrides.x86_64.yaml --no-parent

and after installing packages rpm-ostree errors out with

error: openat(etc/passwd): No such file or directory
ashcrow commented 4 years ago

@xcross-ibm and I just did a sync over the cosa process and the /etc/passwd issue he hit.

jlebon commented 4 years ago

Do you have passwd and group from https://github.com/coreos/fedora-coreos-config/tree/testing-devel/manifests in your source config?

xcross-ibm commented 3 years ago

Hi @ashcrow, I've got a manifest that inherits from bootable-rpm-ostree.yaml that builds. Now for the fun part. The initrd is failing with the following output:

[    4.633860] systemd[1]: ostree-prepare-root.service: Main process exited, code=exited, status=1/FAILURE
[    4.634079] systemd[1]: ostree-prepare-root.service: Failed with result 'exit-code'.
[    4.635545] systemd[1]: Failed to start OSTree Prepare OS/.
[    4.665306] systemd-sysctl[202]: Not setting net/ipv4/conf/all/rp_filter (explicit setting exists).
[    4.665546] systemd-sysctl[202]: Not setting net/ipv4/conf/default/rp_filter (explicit setting exists).
[    4.665670] systemd-sysctl[202]: Not setting net/ipv4/conf/all/accept_source_route (explicit setting exists).
[    4.666169] ostree-prepare-root[200]: ostree-prepare-root: Couldn't find specified OSTree root '/sysroot//ostree/boot.1/fedora-coreos/72370e69833fc9858ce10cd394183c9b918ef0785c45e36df311b9b5054eee2a/0y
[FAILED] Failed to start OSTree Prepare OS/.
[    3.649938] ostree-prepare-root[    5.061821] systemd[1]: ostree-prepare-root.service: Triggering OnFailure= dependencies.
See 'systemctl status ostree-prepare-root.service' for details.

Aside from instrumenting ostree-prepare-root.service and investigating the obvious double-slash in /sysroot//ostree/... can you say a few things about debugging the initrd in coreos-assembler? In our own system you build an image with the init script modified to shell out in interesting places so you can poke around interactively. (Otherwise its just a case of real men debug with printf. DPOS is primitive...)

cgwalters commented 3 years ago

See https://github.com/coreos/fedora-coreos-tracker/blob/master/internals/README-initramfs.md#debugging-the-initramfs

In this case, are you doing anything else besides cosa build? I'd look at the disk image offline with e.g. guestfish --ro -a /path/to/foo.qcow2 and verify that the target root exists.

(Also tangentially I'd override the rojig/name: to be something other than fedora-coreos)

xcross-ibm commented 3 years ago

@cgwalters Thanks for the RTFM link which was immediately useful. On first reading, it looks like ignition is fundamental so I switched to inheriting from ignition-and-ostree.yaml instead of bootable-rpm-ostree.yaml. Is that a correct conclusion about ignition?

In this case, are you doing anything else besides cosa build?

No. And unfortunately switching to using ignition-and-ostree.yaml my first error is the same as above. I'll follow your suggestions to dig in.

Also tangentially I'd override the rojig/name:

I did that in my first try. It builds ok but cosa run fails immediately with

COREOS_ASSEMBLER_CONFIG_GIT=/var/home/chris/git/fedora-coreos-config/
+ podman run --rm -ti --security-opt label=disable --privileged --uidmap=1000:0:1 --uidmap=0:1:1000 --uidmap 1001:1001:64536 -v /var/home/chris/fcos:/srv/ --device /dev/kvm --device /dev/fuse --tmpfs /tmn
Error: Unknown distribution: fedora-minos
2020-10-13T19:30:01Z cli: Unknown distribution: fedora-minos

I figured "distribution" is a pretty wide regex to look for this in the cosa code so I quickly changed it back to rojig/name: fedora-coreos for now and will worry about the name later.

cgwalters commented 3 years ago

Oh right, yeah not using Ignition definitely is something that's going to be under-tested in coreos-assembler and might fail. In the upstream rpm-ostree project we will continue to support non-Ignition operating systems but the focus of CoreOS is the combination of Ignition and rpm-ostree plus other related projects like zincati etc.

It might be simpler to in the end to just use coreos-assembler as a reference for how to use rpm-ostree and make disk images than anything else. Perhaps fork it and cut out what you aren't using?

Are you using just cosa build ostree or are you also generating qemu disk images? How exactly are you running the system? Are you using qemu?

xcross-ibm commented 3 years ago

@cgwalters You may be right but I don't yet know enough to know which parts to use and which to cut. For example, before reading README-initramfs I thought of ignition as just post-boot config like creating users and passwords. (And just now realizing, "oh, is disk partitioning post-boot too?!") We have identified create_disk.sh as an example of something we may need to write from scratch.

So far I haven't strayed outside the bounds of the REAMDE, so cosa build and cosa run and then using virsh to install and run the qcow image. I haven't used quemu outside cosa.

xcross-ibm commented 3 years ago

@cgwalters @ashcrow I'm back on to this after a hiatus for moving.

I'm using fedora-coreos-config stable, my manifest.yaml inherits from manifests/ignition-and-ostree.yaml, and cosa build completes without errors (or at least returns 0 to the shell.)

When I cosa run the image, the boot fails with the following output:

    [    3.623859] ostree-prepare-root[218]: ostree-prepare-root: Couldn't find specified OSTree root '/sysroot//ostree/boot.1/fedora-coreos/fa14d31c2c10e6377b288987e48716b2a8388fc83d669ff014cc2f99ac6b1626/0y
    [  OK  ] Finished Create list of st… nodes for the current kernel.
    [    2.735578] [    3.927868] audit: type=1130 audit(1606169285.729:2): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='unit=kmod-static-nodes comm="systemd" exe="/usr/lib/systemd/systemd" ho'
    ostree-prepare-root[218]: ostree-prepare-root: Couldn't find specified OSTree root '/sysroot//ostree/boot.[    3.928740] systemd[1]: Started Journal Service.
    1/fedora-coreos/fa14d31c2c10e6377b288987e48716b2a8388fc83d669ff014cc2f99ac6b1626/0': No such file or directory

Looking for that directory in the image with guestfish it seems to be there.

   ><fs> launch
   ><fs> list-filesystems
   /dev/sda1: ext4
   /dev/sda2: vfat
   /dev/sda3: unknown
   /dev/sda4: xfs
   ><fs> mount /dev/sda4 /
   ><fs> ls /ostree/boot.1/fedora-coreos/fa14d31c2c10e6377b288987e48716b2a8388fc83d669ff014cc2f99ac6b1626/0/
   bin
   boot
   dev
   etc
   home ...

I could use some advice for where to go next to debug.

ashcrow commented 3 years ago

The error you see is https://github.com/ostreedev/ostree/blob/master/src/switchroot/ostree-prepare-root.c#L139. Are you able to inspect the system during boot up with rd.break? It may help figure out why the pivot isn't succeeding: https://fedoraproject.org/wiki/How_to_debug_Dracut_problems#Summary_of_dracut_kernel_command_line_options

What does your boot cmdline look like?

jlebon commented 3 years ago

That error is likely just a symptom of an earlier failure. Look for errors from e.g. Ignition or ignition-ostree-mount-firstboot-sysroot.service.

xcross-ibm commented 3 years ago

The current error message is:

   [    5.147753] ostree-prepare-root[220]: ostree-prepare-root: Couldn't find specified OSTree root '/sysroot//ostree/boot.1/fedora-coreos/bb843de33ccc2b80dd9c3afda5badbefcb6b57cae88a8c81c5c174ca08176b5a/0y

@ashcrow I haven't yet sussed out the inputs to grub to answer from the manifest but poking around the image I find this menu entry that correlates:

    ><fs> mount /dev/sda3 /
    ><fs> cat /loader/entries/ostree-1-fedora-coreos.conf
    title Fedora CoreOS 33.20201130.2 (ostree:0)
    version 1
    options mitigations=auto,nosmt systemd.unified_cgroup_hierarchy=0 console=tty0 console=ttyS0,115200n8 ignition.platform.id=qemu $ignition_firstboot ostree=/ostree/boot.1/fedora-coreos/bb843de33ccc2b80dd9c3afda5badbefcb6b57cae88a8c81c5c174ca08176b5a/0
    linux /ostree/fedora-coreos-bb843de33ccc2b80dd9c3afda5badbefcb6b57cae88a8c81c5c174ca08176b5a/vmlinuz-5.9.9-200.fc33.x86_64
    initrd /ostree/fedora-coreos-bb843de33ccc2b80dd9c3afda5badbefcb6b57cae88a8c81c5c174ca08176b5a/initramfs-5.9.9-200.fc33.x86_64.img

The location of initrd and vmlinuz in the image matches the above:

    ><fs> ls /ostree/fedora-coreos-bb843de33ccc2b80dd9c3afda5badbefcb6b57cae88a8c81c5c174ca08176b5a/
    .vmlinuz-5.9.9-200.fc33.x86_64.hmac
    initramfs-5.9.9-200.fc33.x86_64.img
    vmlinuz-5.9.9-200.fc33.x86_64
    >

The directory in the image matches the ostree param in the grub config:

    ><fs> mount /dev/sda4 /
    ><fs> ls /ostree/boot.1/fedora-coreos/bb843de33ccc2b80dd9c3afda5badbefcb6b57cae88a8c81c5c174ca08176b5a/0
    bin
    boot
    dev...

@ashcrow I haven't learned how to use rd.break yet. The dracut doc starts with updating grub.cfg which I don't find in fedora-coreos-config except for ./live/EFI/fedora/grub.cfg. How does one modify the linux cmd as input to cosa build? RTFM refs welcome.

@jlebon AFAICT the error at the top is the first problem in the boot.log.

ashcrow commented 3 years ago

ashcrow I haven't learned how to use rd.break yet. The dracut doc starts with updating grub.cfg which I don't find in fedora-coreos-config except for ./live/EFI/fedora/grub.cfg. How does one modify the linux cmd as input to cosa build? RTFM refs welcome.

During a cosa run --devshell -c --qemu-image $IMG you can interrupt grub with e to edit the config for the boot. You can add in rd.break=$PHASE (Example: rd.break=pre-mount) to the line starting with linux and boot with ctrl+x.

Here are the steps:

  1. Run cosa with the serial console and devshell cosa run --devshell -c --qemu-image $IMG
  2. When you see the grub menu hit e
  3. Using just the arrow keys, go to the end of the line that starts with linux and add rd.break=$PHASE_FROM_THE_DOCS_YOU_WANT
  4. Hit ctrl+x to boot
  5. Wait until either you are dropped into a shell or the system seems like it's frozen. If the latter, hit enter a few times as sometimes the shell is available but it's not visible yet.

Depending on the phase you pick you'll have the machine in different states and can inspect what things look like during different stages in the boot.

cgwalters commented 3 years ago

[ 5.147753] ostree-prepare-root[220]: ostree-prepare-root: Couldn't find specified OSTree root '/sysroot//ostree/boot.1/fedora-coreos/bb843de33ccc2b80dd9c3afda5badbefcb6b57cae88a8c81c5c174ca08176b5a/0y

One cause of that error is not having /sysroot mounted correctly in the initramfs. With CoreOS (ostree+Ignition) this is handled by https://github.com/coreos/fedora-coreos-config/tree/testing-devel/overlay.d/05core/usr/lib/dracut/modules.d/40ignition-ostree

Since you're not using Ignition, I think probably the simplest is to add root=LABEL=root to your kernel arguments or so if you haven't.

xcross-ibm commented 3 years ago

@ashcrow Teach a man to fish, thanks!

@cgwalters I started using ignition following your remarks in https://github.com/coreos/coreos-assembler/issues/1680#issuecomment-707999361. My manifest includes ignition-and-ostree.yaml. We have some manufacturing and first boot requirements that make ignition a good choice anyway.

I used rd.shell following @ashcrow 's instructions. I can see the file systems here:

    :/root# lsblk
    NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
    sr0     11:0    1 1024M  0 rom  
    vda    252:0    0    8G  0 disk 
    |-vda1 252:1    0    1M  0 part 
    |-vda2 252:2    0  127M  0 part 
    |-vda3 252:3    0  384M  0 part 
    `-vda4 252:4    0  1.6G  0 part /mnt/vda4

and I can mount /dev/vda4 and see the directory we want at /sysroot. However, it's not mounted in the initramfs and /sysroot is empty, which would explain the error "Couldn't find specified OSTree root '/sysroot//ostree..."

Since I'm inheriting ignition-and-ostree.yaml without change, I would expect it to just work. Perhaps I'm missing something configured between manifest.yaml and ignition-and-ostree.yaml in the FCOS hierarchy that ignition depends upon. I'll start looking for how and when the 40ignition-ostree scripts are run. Is there a phase in the dracut breakpoints that correlates to that? rd.break={cmdline|pre-udev|pre-trigger|initqueue|pre-mount|mount|pre-pivot|cleanup}

xcross-ibm commented 3 years ago

@ashcrow @cgwalters The error apparently occurs before any of the dracut break points. With rd.break=cmdline my image falls into the emergency shell the same way it does without a dracut break point (using only cosa run -c.) I attempted to manually mount /sysroot and continue but I see no log activity when I press Ctl-d to continue. It just falls back into the emergency shell.

I used rd.break with a good FCOS image I built and it's working as expected.

Here's the boot log from the initramfs banner to the failure.

Welcome to Fedora CoreOS 33.20201130.2 dracut-050-64.git20200529.fc33 (Initramfs)!

[    3.729455] systemd[1]: No hostname configured.
[    3.740218] systemd[1]: Set hostname to <fedora>.
[    3.748773] systemd[1]: Initializing machine ID from random generator.
[    3.843207] systemd[1]: Queued start job for default target Initrd Default Target.
[    3.857441] systemd[1]: Started Forward Password Requests to Clevis Directory Watch.
[  OK  ] Started Forward Password R…sts to Clevis Directory Watch.
[    3.878706] systemd[1]: Started Dispatch Password Requests to Console Directory Watch.
[  OK  ] Started Dispatch Password …ts to Console Directory Watch.
[    3.896775] systemd[1]: Reached target Local Encrypted Volumes.
[  OK  ] Reached target Local Encrypted Volumes.
[    3.909135] systemd[1]: Reached target Local File Systems.
[  OK  ] Reached target Local File Systems.
[    3.921964] systemd[1]: Reached target Paths.
[  OK  ] Reached target Paths.
[    3.931764] systemd[1]: Reached target Slices.
[  OK  ] Reached target Slices.
[    3.942416] systemd[1]: Reached target Swap.
[  OK  ] Reached target Swap.
[    3.952812] systemd[1]: Reached target Timers.
[  OK  ] Reached target Timers.
[    3.964726] systemd[1]: Listening on Journal Audit Socket.
[  OK  ] Listening on Journal Audit Socket.
[    3.982879] systemd[1]: Listening on Journal Socket (/dev/log).
[  OK  ] Listening on Journal Socket (/dev/log).
[    3.998723] systemd[1]: Listening on Journal Socket.
[  OK  ] Listening on Journal Socket.
[    4.014425] systemd[1]: Listening on udev Control Socket.
[  OK  ] Listening on udev Control Socket.
[    4.031660] systemd[1]: Listening on udev Kernel Socket.
[  OK  ] Listening on udev Kernel Socket.
[    4.048168] systemd[1]: Reached target Sockets.
[  OK  ] Reached target Sockets.
[    4.060693] systemd[1]: Condition check resulted in Check that initrd matches kernel being skipped.
[    4.080409] systemd[1]: Finished CoreOS Tear down initramfs.
[  OK  ] Finished CoreOS Tear down initramfs.
[    4.094381] systemd[1]: Condition check resulted in Remount /sysroot read-write for Ignition being skipped.
[    4.112788] systemd[1]: Starting Create list of static device nodes for the current kernel...
         Starting Create list of st…odes for the current kernel...
[    4.131901] systemd[1]: Started Memstrack Anylazing Service.
[  OK  ] Started Memstrack Anylazing Service.
[    4.146335] systemd[1]: Starting OSTree Prepare OS/...
         Starting OSTree Prepare OS/...
[    4.164242] systemd[1]: Starting Journal Service...
         Starting Journal Service...
[    4.190798] systemd[1]: Condition check resulted in Load Kernel Modules being skipped.
[    4.210575] systemd[1]: Starting Apply Kernel Variables...
         Starting Apply Kernel Variables...
[    4.230152] systemd[1]: Starting Setup Virtual Console...
         Starting Setup Virtual Console...
[    4.245752] systemd[1]: Finished Create list of static device nodes for the current kernel.
[  OK  ] Finished Create list of st… nodes for the current kernel.
[    4.270612] systemd[1]: memstrack.service: Succeeded.
[    4.271305] systemd[1]: ostree-prepare-root.service: Main process exited, code=exited, status=1/FAILURE
[    4.271413] systemd[1]: ostree-prepare-root.service: Failed with result 'exit-code'.
[    4.271700] systemd[1]: Failed to start OSTree Prepare OS/.
[    4.290700] systemd-sysctl[229]: Not setting net/ipv4/conf/all/rp_filter (explicit setting exists).
[    4.290999] systemd-sysctl[229]: Not setting net/ipv4/conf/default/rp_filter (explicit setting exists).
[    4.291150] systemd-sysctl[229]: Not setting net/ipv4/conf/all/accept_source_route (explicit setting exists).
[    4.291237] systemd-sysctl[229]: Not setting net/ipv4/conf/default/accept_source_route (explicit setting exists).
[    4.291373] ostree-prepare-root[219]: ostree-prepare-root: Couldn't find specified OSTree root '/sysroot//ostree/boot.1/fedora-coreos/bb843de33ccc2b80dd9c3afda5badbefcb6b57cae88a8c81c5c174ca08176b5a/0y
xcross-ibm commented 3 years ago

Comparing the above to a good FCOS image, there is a bunch of dracut and ignition stuff executing before ostree-prepare-root.service that is missing entirely from my failure log.

Which may be explained by this:

:/root# systemctl
  UNIT                                                                                LOAD   ACTIVE SUB     DESCRIPTION                                                       
  sys-devices-pci0000:00-0000:00:01.1-ata2-host1-target1:0:0-1:0:0:0-block-sr0.device loaded active plugged QEMU_DVD-ROM                                                      
  -.mount                                                                             loaded active mounted Root Mount                                                        
  sys-kernel-config.mount                                                             loaded active mounted Kernel Configuration File System                                  
  sys-kernel-tracing.mount                                                            loaded active mounted /sys/kernel/tracing                                               
  init.scope                                                                          loaded active running System and Service Manager                                        
  coreos-teardown-initramfs.service                                                   loaded active exited  CoreOS Tear down initramfs                                        
  emergency.service                                                                   loaded active running Emergency Shell                                                   
● ignition-files.service                                                              loaded failed failed  Ignition (files)                                                  
  ignition-virtio-dump-journal.service                                                loaded active exited  Dump journal to virtio port                                       
● ostree-prepare-root.service                                                         loaded failed failed  OSTree Prepare OS/                                                
  systemd-journald.service                                                            loaded active running Journal Service                                                   

and this:

:/root# systemctl status ignition-files.service
● ignition-files.service - Ignition (files)
     Loaded: loaded (/usr/lib/systemd/system/ignition-files.service; static)
     Active: failed (Result: exit-code) since Fri 2020-12-04 23:43:38 UTC; 6min ago
       Docs: https://github.com/coreos/ignition
    Process: 410 ExecStart=/usr/bin/ignition --root=/sysroot --platform=${PLATFORM_ID} --stage=files --log-to-stdout (code=exited, status=1/FAILURE)
   Main PID: 410 (code=exited, status=1/FAILURE)

Dec 04 23:43:38 fedora ignition[410]:         "mask": false,
Dec 04 23:43:38 fedora ignition[410]:         "name": "var-mnt-workdir\\x2dtmp.mount"
Dec 04 23:43:38 fedora ignition[410]:       }
Dec 04 23:43:38 fedora ignition[410]:     ]
Dec 04 23:43:38 fedora ignition[410]:   }
Dec 04 23:43:38 fedora ignition[410]: }CRITICAL : Ignition failed: failed to create users/groups: failed to configure users: failed to create user "core": exit status 10: Cmd: "useradd" "--root" "/sysroo"
Dec 04 23:43:38 fedora systemd[1]: ignition-files.service: Main process exited, code=exited, status=1/FAILURE
Dec 04 23:43:38 fedora systemd[1]: ignition-files.service: Failed with result 'exit-code'.
Dec 04 23:43:38 fedora systemd[1]: Failed to start Ignition (files).
xcross-ibm commented 3 years ago

@cgwalters , looking a bit more closely at the configuration you cite I find it in the /dev/vda4 file system that has not been mounted. How do those jobs get run before the mounting of /sysroot? I do notice ignition-ostree-mount-sysroot.sh in that directory but it only makes the answer to that question more important.

xcross-ibm commented 3 years ago

@ashcrow The failing systemd unit is ostree-prepare-root.service and codes After= and Requires= as sysroot.mount. The sysroot.mount doesn't appear at all in systemctl output in the emergency shell.

Using rd.break with a good fcos image I found that sysroot.mount was run between pre-trigger and initqueue and looks like this:

sh-5.0# systemctl | grep sysroot.mount
  sysroot-sysroot.mount                                                               loaded active     mounted       /sysroot/sysroot                                                  
  sysroot.mount                                                                       loaded active     mounted       sysroot.mount                                                     

I tried looking for the source for sysroot.mount and only find reference to it being generated in live-generator. However, since this is in the sysroot file system I'm no closer to figuring out how /sysroot gets mounted.

I'll cry uncle now and ask if there's an SME that can tell me precisely how /sysroot is mounted prior to ostree-prepare-root.service is run.

ashcrow commented 3 years ago

@jlebon if you have some time would you mind taking a look and providing guidance/next steps to @xcross-ibm?

jlebon commented 3 years ago

On first boot, the sysroot gets mounted by https://github.com/coreos/fedora-coreos-config/blob/testing-devel/overlay.d/05core/usr/lib/dracut/modules.d/40ignition-ostree/ignition-ostree-mount-firstboot-sysroot.service. On subsequent boots, it gets mounted by systemd itself via the root= karg which systemd-fstab-generator picks upon.

jlebon commented 3 years ago

systemd automatically generates mount units based on /proc/self/mountinfo even for mounts which are done independently via the mount command. This is why you don't see an explicit sysroot.mount unit file in the non-live case; there isn't one.

wranders commented 2 years ago

Also tangentially I'd override the rojig/name:

I did that in my first try. It builds ok but cosa run fails immediately with

COREOS_ASSEMBLER_CONFIG_GIT=/var/home/chris/git/fedora-coreos-config/
+ podman run --rm -ti --security-opt label=disable --privileged --uidmap=1000:0:1 --uidmap=0:1:1000 --uidmap 1001:1001:64536 -v /var/home/chris/fcos:/srv/ --device /dev/kvm --device /dev/fuse --tmpfs /tmn
Error: Unknown distribution: fedora-minos
2020-10-13T19:30:01Z cli: Unknown distribution: fedora-minos

I figured "distribution" is a pretty wide regex to look for this in the cosa code so I quickly changed it back to rojig/name: fedora-coreos for now and will worry about the name later.

I ran across this issue while searching and wanted to give the solution in case anyone else encounters this.

https://github.com/coreos/coreos-assembler/blob/d633351f74741ffc4beb47ee9edc37393481eb69/mantle/sdk/distros.go#L36-L46

When .rojig.name is changed in a manifest, the above function is what results in the Unknown distribution error, so you'll have to specify fcos (or rhcos) when running the image using the -b, --distro flag.

cosa run -b fcos