GenPi64 / Build.Dist

Build scripts for building GenPi64 images.
Other
11 stars 10 forks source link

Error Handling Which Preserves Possible Problem File? #180

Open jlpoolen opened 1 year ago

jlpoolen commented 1 year ago

Enhancement Request During failed Build (not running under Jenkins, but in a console as root), a clean-up is performed which removes a temporary file of interest. For example, from Issue #179:


Running eselect locale set en_US.utf8
+ trap finish EXIT
+ cat
+ chmod +x /home/jlpoole/local/Build.Dist/build/GenPi64OpenRC/chroot/em-5481
+ /home/jlpoole/local/Build.Dist/scripts/chroot.py /em-5481
Traceback (most recent call last):
  File "/usr/lib/python3.10/site-packages/pychroot/scripts/pychroot.py", line 130, in main
  File "/usr/lib/python3.10/subprocess.py", line 503, in run
    with Popen(*popenargs, **kwargs) as process:
  File "/usr/lib/python3.10/subprocess.py", line 971, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "/usr/lib/python3.10/subprocess.py", line 1847, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
OSError: [Errno 8] Exec format error: '/em-5481'
+ finish
+ ret=1
+ rm -f /home/jlpoole/local/Build.Dist/build/GenPi64OpenRC/chroot/em-5481
+ exit 1
FATAL: JOBFAILED  locale
FATAL: JOBFAILED  gentoo-base
run complete.

The file of interest is em-5481, but a higher level handler removes the file.

jlpoole@jenk ~ $ ls -la /home/jlpoole/local/Build.Dist/build/GenPi64OpenRC/chroot/em-5481
ls: cannot access '/home/jlpoole/local/Build.Dist/build/GenPi64OpenRC/chroot/em-5481': No such file or directory
jlpoole@jenk ~ $

the "+ ret=1" suggests that the handler which then executes the "rm -f..." would know the script is in error mode and likewise might do something that makes the file, em-5481, available. I'm thinking maybe cp the file to /tmp and print out a status report of doing so. I'll take a look and the code and see if what I'm thinking of is possible. I thought filing an enhancement request might highlight this type of approach. If it can be done, I'll do it and see if I can preserve the temporary file for forensic analysis on a future failure.

jonesmz commented 1 year ago

make a pull request that modifies all instances of the keyword "trap" in the folder parsers/* to preserve the file on failure.

jlpoolen commented 1 year ago

I wanted to run the command manually to see if I catch any STDERR messages that might be missed when in Jenkins. So, continuing in manual mode I accomplished the following. What is interesting is the:

 raise KeyError(key) from None
        KeyError: 'CCACHE_DIR'

Here's my test session:

Wednesday  Jan 04, 2023  9:35:44 PM

Debugging failure.
Modified suspected file: rawcommand:

        diff --git a/parsers/rawcommand/rawcommand b/parsers/rawcommand/rawcommand
        index d48927e..f821d72 100755
        --- a/parsers/rawcommand/rawcommand
        +++ b/parsers/rawcommand/rawcommand
        @@ -5,8 +5,11 @@ echo "Running $@"

         function finish
         {
        -       ret=$?
        +    ret=$?
        +    cp "${PROJECT_DIR}/chroot/em-$$" /tmp/rawcommand_em-$$
        +    echo copied em-$$ to /tmp/rawcommand_em-$$
                rm -f "${PROJECT_DIR}/chroot/em-$$"
        +
                exit $ret
         }
         trap finish EXIT

As root: 
    cd  /home/jlpoole/local/Build.Dist 
    ./build.sh

        ...
        checking  set_root_password deps ['news']
        running jobs [(<Popen: returncode: None args: ['/home/jlpoole/local/Build.Dist/parsers/rawc...>, 'locale')]
        + source /home/jlpoole/local/Build.Dist/scripts/functions.sh
        + echo 'Running eselect locale set en_US.utf8'
        Running eselect locale set en_US.utf8
        + trap finish EXIT
        + cat
        + chmod +x /home/jlpoole/local/Build.Dist/build/GenPi64OpenRC/chroot/em-7110
        + /home/jlpoole/local/Build.Dist/scripts/chroot.py /em-7110
        Traceback (most recent call last):
          File "/usr/lib/python3.10/site-packages/pychroot/scripts/pychroot.py", line 130, in main
          File "/usr/lib/python3.10/subprocess.py", line 503, in run
            with Popen(*popenargs, **kwargs) as process:
          File "/usr/lib/python3.10/subprocess.py", line 971, in __init__
            self._execute_child(args, executable, preexec_fn, close_fds,
          File "/usr/lib/python3.10/subprocess.py", line 1847, in _execute_child
            raise child_exception_type(errno_num, err_msg, err_filename)
        OSError: [Errno 8] Exec format error: '/em-7110'
        + finish
        + ret=1
        + cp /home/jlpoole/local/Build.Dist/build/GenPi64OpenRC/chroot/em-7110 /tmp/rawcommand_em-7110
        + echo copied em-7110 to /tmp/rawcommand_em-7110
        copied em-7110 to /tmp/rawcommand_em-7110
        + rm -f /home/jlpoole/local/Build.Dist/build/GenPi64OpenRC/chroot/em-7110
        + exit 1
        FATAL: JOBFAILED  locale
        FATAL: JOBFAILED  gentoo-base
        run complete.

# copied preserved file back in and reran chroot.py command:

        jenk /home/jlpoole/local/Build.Dist # cat /tmp/rawcommand_em-7110
                #!/usr/bin/env bash
                set -evx
                source /etc/profile
                eselect locale set en_US.utf8
        jenk /home/jlpoole/local/Build.Dist # cp /tmp/rawcommand_em-7110 /home/jlpoole/local/Build.Dist/build/GenPi64OpenRC/chroot/em-7110
        jenk /home/jlpoole/local/Build.Dist # /home/jlpoole/local/Build.Dist/scripts/chroot.py /em-7110
        Traceback (most recent call last):
          File "/home/jlpoole/local/Build.Dist/scripts/chroot.py", line 7, in <module>
            if not os.path.exists(os.environ['CCACHE_DIR']):
          File "/usr/lib/python-exec/python3.10/../../../lib/python3.10/os.py", line 680, in __getitem__
            raise KeyError(key) from None
        KeyError: 'CCACHE_DIR'
        jenk /home/jlpoole/local/Build.Dist #  ls -la /home/jlpoole/local/Build.Dist/build/GenPi64OpenRC/chroot/em-7110
        -rwxr-xr-x 1 root root 111 Jan  4 21:35 /home/jlpoole/local/Build.Dist/build/GenPi64OpenRC/chroot/em-7110
        jenk /home/jlpoole/local/Build.Dist # rm /home/jlpoole/local/Build.Dist/build/GenPi64OpenRC/chroot/em-7110
        jenk /home/jlpoole/local/Build.Dist #
jonesmz commented 1 year ago

The exec format error means that something is wrong with how you have qemu setup.

Try rebooting.

jlpoolen commented 1 year ago

Three items:

1) I compared the current local manual build's chroot tree with the successful one I built last January to see if there were any files or directories in /var/tmp. Both were empty. 2) when I ran the the project in Jenkins, there was a failure for a missing directory. I found all the "mkdir" commands and found a couple missing the "-p" and patched my copy on Windows and "committed", but did not push, so my commits never made it to the Jenkins instance. Nonetheless, when I reran Jenkins, the missing directory error did not occur, so something in the Jenkins build gets cached and manifest itself on the first run. Given that most of the mkdir commands have "-p" and there were a couple that did not, I bring this to your attention as a possible subtle bug that may only manifest itself on the first run. My inexperience with git's process flow (I'm used to Subversion) made me think if I "committed" on my Windows instance, my instance on GitHub would automatically get updated; I did not know about the requisite "push" step. Now I do.

3) my qemu install:


jenk /home/jlpoole/local/Build.Dist # date; eix -I qemu
Wed Jan  4 10:10:37 PM PST 2023
[I] app-emulation/qemu
     Available versions:  ~7.1.0^t 7.1.0-r2^t ~7.2.0^t **7.2.0-r1^t **9999*l^t {accessibility +aio alsa bpf bzip2 +caps capstone +curl debug (+)doc +fdt +filecaps fuse glusterfs +gnutls gtk infiniband io-uring iscsi jack jemalloc +jpeg lzo multipath ncurses nfs nls numa opengl +oss pam +pin-upstream-blobs plugins +png pulseaudio python rbd sasl sdl sdl-image +seccomp selinux +slirp smartcard snappy spice ssh static static-user systemtap test udev usb usbredir vde +vhost-net vhost-user-fs virgl virtfs +vnc vte xattr xen zstd PYTHON_TARGETS="python3_8 python3_9 python3_10 python3_11" QEMU_SOFTMMU_TARGETS="aarch64 alpha arm avr cris hppa i386 loongarch64 m68k microblaze microblazeel mips mips64 mips64el mipsel nios2 or1k ppc ppc64 riscv32 riscv64 rx s390x sh4 sh4eb sparc sparc64 tricore x86_64 xtensa xtensaeb" QEMU_USER_TARGETS="aarch64 aarch64_be alpha arm armeb cris hexagon hppa i386 loongarch64 m68k microblaze microblazeel mips mips64 mips64el mipsel mipsn32 mipsn32el nios2 or1k ppc ppc64 ppc64le riscv32 riscv64 s390x sh4 sh4eb sparc sparc64 sparc32plus x86_64 xtensa xtensaeb"}
     Installed versions:  7.1.0-r2^t(09:41:03 PM 01/01/2023)(aio bzip2 curl fdt filecaps gnutls jpeg ncurses nls oss pam pin-upstream-blobs png slirp static-user vhost-net vnc xattr -accessibility -alsa -bpf -capstone -debug -doc -fuse -glusterfs -gtk -infiniband -io-uring -iscsi -jack -jemalloc -lzo -multipath -nfs -numa -opengl -plugins -pulseaudio -python -rbd -sasl -sdl -sdl-image -selinux -smartcard -snappy -spice -ssh -static -systemtap -test -udev -usb -usbredir -vde -virgl -virtfs -vte -xen -zstd PYTHON_TARGETS="python3_10 -python3_8 -python3_9 -python3_11" QEMU_SOFTMMU_TARGETS="aarch64 x86_64 -alpha -arm -avr -cris -hppa -i386 -loongarch64 -m68k -microblaze -microblazeel -mips -mips64 -mips64el -mipsel -nios2 -or1k -ppc -ppc64 -riscv32 -riscv64 -rx -s390x -sh4 -sh4eb -sparc -sparc64 -tricore -xtensa -xtensaeb" QEMU_USER_TARGETS="aarch64 x86_64 -aarch64_be -alpha -arm -armeb -cris -hexagon -hppa -i386 -loongarch64 -m68k -microblaze -microblazeel -mips -mips64 -mips64el -mipsel -mipsn32 -mipsn32el -nios2 -or1k -ppc -ppc64 -ppc64le -riscv32 -riscv64 -s390x -sh4 -sh4eb -sparc -sparc64 -sparc32plus -xtensa -xtensaeb")
     Homepage:            https://www.qemu.org https://www.linux-kvm.org
     Description:         QEMU + Kernel-based Virtual Machine userland tools

jenk /home/jlpoole/local/Build.Dist #

I'll proceed with a reboot now.

jlpoolen commented 1 year ago

rebooted and got same result:

jenk /home/jlpoole/local/Build.Dist # date;./build.sh
Wed Jan  4 10:14:55 PM PST 2023
/home/jlpoole/local/Build.Dist/scripts/binfmt.sh: line 35: echo: write error: No such file or directory
loading manifest
...
checking  set_root_password deps ['news']
running jobs [(<Popen: returncode: None args: ['/home/jlpoole/local/Build.Dist/parsers/rawc...>, 'locale')]
+ source /home/jlpoole/local/Build.Dist/scripts/functions.sh
+ echo 'Running eselect locale set en_US.utf8'
Running eselect locale set en_US.utf8
+ trap finish EXIT
+ cat
+ chmod +x /home/jlpoole/local/Build.Dist/build/GenPi64OpenRC/chroot/em-2122
+ /home/jlpoole/local/Build.Dist/scripts/chroot.py /em-2122
Traceback (most recent call last):
  File "/usr/lib/python3.10/site-packages/pychroot/scripts/pychroot.py", line 130, in main
  File "/usr/lib/python3.10/subprocess.py", line 503, in run
    with Popen(*popenargs, **kwargs) as process:
  File "/usr/lib/python3.10/subprocess.py", line 971, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "/usr/lib/python3.10/subprocess.py", line 1847, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
OSError: [Errno 8] Exec format error: '/em-2122'
+ finish
+ ret=1
+ cp /home/jlpoole/local/Build.Dist/build/GenPi64OpenRC/chroot/em-2122 /tmp/rawcommand_em-2122
+ echo copied em-2122 to /tmp/rawcommand_em-2122
copied em-2122 to /tmp/rawcommand_em-2122
+ rm -f /home/jlpoole/local/Build.Dist/build/GenPi64OpenRC/chroot/em-2122
+ exit 1
FATAL: JOBFAILED  locale
FATAL: JOBFAILED  gentoo-base
run complete.
jenk /home/jlpoole/local/Build.Dist # git status
On branch main
Your branch is up to date with 'origin/main'.

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
        modified:   parsers/rawcommand/rawcommand

Untracked files:
  (use "git add <file>..." to include in what will be committed)
        parsers/rawcommand/rawcommand~

no changes added to commit (use "git add" and/or "git commit -a")
jenk /home/jlpoole/local/Build.Dist # cp /tmp/rawcommand_em-2122 /home/jlpoole/local/Build.Dist/build/GenPi64OpenRC/chroot
jenk /home/jlpoole/local/Build.Dist # /home/jlpoole/local/Build.Dist/scripts/chroot.py /em-2122
Traceback (most recent call last):
  File "/home/jlpoole/local/Build.Dist/scripts/chroot.py", line 7, in <module>
    if not os.path.exists(os.environ['CCACHE_DIR']):
  File "/usr/lib/python-exec/python3.10/../../../lib/python3.10/os.py", line 680, in __getitem__
    raise KeyError(key) from None
KeyError: 'CCACHE_DIR'
jenk /home/jlpoole/local/Build.Dist # ls /home/jlpoole/local/Build.Dist/build/GenPi64OpenRC/chroot
bin   em-config       home   media  proc                run   tmp
boot  em-config.json  lib    mnt    rawcommand_em-2122  sbin  usr
dev   etc             lib64  opt    root                sys   var
jenk /home/jlpoole/local/Build.Dist # mv /home/jlpoole/local/Build.Dist/build/GenPi64OpenRC/chroot/rawcommand_em-2122 /home/jlpoole/local/Build.Dist/build/GenPi64OpenRC/chroot/em-2122
jenk /home/jlpoole/local/Build.Dist # /home/jlpoole/local/Build.Dist/scripts/chroot.py /em-2122
Traceback (most recent call last):
  File "/home/jlpoole/local/Build.Dist/scripts/chroot.py", line 7, in <module>
    if not os.path.exists(os.environ['CCACHE_DIR']):
  File "/usr/lib/python-exec/python3.10/../../../lib/python3.10/os.py", line 680, in __getitem__
    raise KeyError(key) from None
KeyError: 'CCACHE_DIR'
jenk /home/jlpoole/local/Build.Dist # rm /home/jlpoole/local/Build.Dist/build/GenPi64OpenRC/chroot/em-2122
jenk /home/jlpoole/local/Build.Dist #
samip5 commented 1 year ago

Please see /home/jlpoole/local/Build.Dist/scripts/binfmt.sh: line 35: echo: write error: No such file or directory at the start, so your qemu is the part of the issue.

Please try to rerun only that script manually.

jlpoolen commented 1 year ago

What https://github.com/GenPi64/Build.Dist/issues/180#issuecomment-1372113199 is referring to is at the outset of my run of buid.sh, specifically line 3 below [I added line line numbers]:


    1   jenk /home/jlpoole/local/Build.Dist # date;./build.sh
    2   Wed Jan  4 10:14:55 PM PST 2023
    3   /home/jlpoole/local/Build.Dist/scripts/binfmt.sh: line 35: echo: write error: No such file or directory
    4   loading manifest
    5   load complete
    6   Loading status
    7   status not loadable
    8   load complete
    ...

The "write error" message is generated by line 35 of binfmt.sh which currently has [I added line line numbers]:

    33  if [ "${cpu}" != "arm" ] ; then
    34      if [[ ! -e /proc/sys/fs/binfmt_misc/arm ]]; then
    35      echo ":arm:M::\x7fELF\x01\x01\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x28\x00:\xff\xff\xff\xff\xff\xff\xff\x00\xff\xff\xff\xff\xff\xff\xff\xff\xfe\xff\xff\xff:/usr/bin/qemu-arm:F" > /proc/sys/fs/binfmt_misc/register
    36      fi
    37  fi

The variable $cpu is set earlier in binfmt.sh from a system call to "uname -m". So, I'm guessing that Qemu causes "uname -m" to produce something that matches "armv[4-9]*" thereby triggering the "if" clause at binfmt.sh@33. So the failure at line 35 is the echo command populating /proc/sys/fs/binfmt_misc/register:

echo ":arm:M::\x7fELF\x01\x01\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x28\x00:\xff\xff\xff\xff\xff\xff\xff\x00\xff\xff\xff\xff\xff\xff\xff\xff\xfe\xff\xff\xff:/usr/bin/qemu-arm:F" > /proc/sys/fs/binfmt_misc/register

The last line, #40, of env.sh, invokes binfmt.sh:

[40] ${SCRIPTS}/binfmt.sh"

env.sh, in turn, is called by build.sh at line 14:

[14] source ${BASEDIR}/env.sh

Note: immediately following line 14 in build.sh are the following:

23  if [[ ! -d "$PROJECT_DIR" ]]; then
24      if [[ ! -e "$PROJECT_DIR" ]]; then
25      if [ ! -z $BTRFS_SNAPSHOTS ]; then
26          btrfs subvolume create "$PROJECT_DIR"
27      fi
28      fi
29  fi

I recall from my previous successful build experience in January 2022, performed in a VM, that I had to manually load a module for btrfs since it was not built into my kernel, there was a requirement the btrfs be working.

The lynchpin here looks to be the binfmt.sh@35 failure that is reported out, but does not halt the process. I will try experimenting with the "btrfs subvolume create" and determine what the current system is on my ...chroot. It seems that if echo line fails, everything should halt. I'll update further as I learn more. Thank you, @samip5 for catching this.

jlpoolen commented 1 year ago

The if clause is looking for "arm" under /proc/sys/fs/binfmt_misc/


    jenk /home/jlpoole # ls /proc/sys/fs/binfmt_misc/
    aarch64  register  status
    jenk /home/jlpoole # 

Since "arm" does not exist, Try duplicating the command


        jenk /home/jlpoole # echo ":arm:M::\x7fELF\x01\x01\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x28\x00:\xff\xff\xff\xff\xff\xff\xff\x00\xff\xff\xff\xff\xff\xff\xff\xff\xfe\xff\xff\xff:/usr/bin/qemu-arm:F" > /proc/sys/fs/binfmt_misc/register
        bash: echo: write error: No such file or directory
        jenk /home/jlpoole #

"register" is present, to the "No such file..." message may be referring to /usr/bin/qemu-arm? Note: above has "/usr/bin/qemu-arm:F", yet /usr/bin has no "qemu-arm":


        jenk /home/jlpoole # ls /usr/bin/qemu*
        /usr/bin/qemu-aarch64  /usr/bin/qemu-pr-helper
        /usr/bin/qemu-edid     /usr/bin/qemu-storage-daemon
        /usr/bin/qemu-img      /usr/bin/qemu-system-aarch64
        /usr/bin/qemu-io       /usr/bin/qemu-system-x86_64
        /usr/bin/qemu-nbd      /usr/bin/qemu-x86_64
        jenk /home/jlpoole #

Has there been a name change from qemu-arm to qemu-aarch64?

jlpoolen commented 1 year ago

https://codepyre.com/2019/12/arming-yourself/ under "Configuration" notes a difference:

 # Create a configuration for arm32v7

...

 # Create a configuration for arm64v8

I'm wondering if the reference to qemu-arm is something that needs to be replaced with qemu-aarch64?

I'm in an area (qemu) where I know practically nothing, other than arm has been developing and the fact that my qemu installation has no qemu-arm, yet has a qemu-aarch64 has me wondering if the problem I face is a result of progress and deprecation.

jonesmz commented 1 year ago

You don't want "arm". That's the 32bit version of the arm architecture. You want aarch64. The " arm" check is only there for completeness sake, if someone decides they really want to bother with 32 bit.

jlpoolen commented 1 year ago

You don't want "arm". That's the 32bit version of the arm architecture. You want aarch64. The " arm" check is only there for completeness sake, if someone decides they really want to bother with 32 bit.

So the execution of binfrmt.sh@35 is for completeness sake and if it errors outs, then the ensuing error message "/home/jlpoole/local/Build.Dist/scripts/binfmt.sh: line 35: echo: write error: No such file or directory" is an expected and/or tolerated error message?

samip5 commented 1 year ago

You don't want "arm". That's the 32bit version of the arm architecture. You want aarch64. The " arm" check is only there for completeness sake, if someone decides they really want to bother with 32 bit.

So the execution of binfrmt.sh@35 is for completeness sake and if it errors outs, then the ensuing error message "/home/jlpoole/local/Build.Dist/scripts/binfmt.sh: line 35: echo: write error: No such file or directory" is an expected and/or tolerated error message?

That means that the writing to the binfmt registry (used to register the qemu instance for arm64 binaries) failed and you should look into why it failed as that's why qemu is not working for you.

jlpoolen commented 1 year ago

I'm confused. @jonesmz says I should not want the arm32 architecture (I agree) and I'm presuming the binfrmt.sh@35 line is for that purpose, yet @samip5 says that the failure at line 35 to write a stream or characters which includes a reference to "arm" to the registry is the essence of my problem. I was thinking the stream needs to be modified for aarch64. What's the point of writing to a registry with "arm" when you are building for "aarch64"? Or am I missing something here?

samip5 commented 1 year ago

I'm confused. @jonesmz says I should not want the arm32 architecture (I agree) and I'm presuming the binfrmt.sh@35 line is for that purpose, yet @samip5 says that the failure at line 35 to write a stream or characters which includes a reference to "arm" to the registry is the essence of my problem. I was thinking the stream needs to be modified for aarch64. What's the point of writing to a registry with "arm" when you are building for "aarch64"? Or am I missing something here?

The issue with the script is that both should be successful as both will be tried to enable if CPU is not aarch64 nor arm but something else. The point of arm registry is to enable support for arm as our builder has support for that too altough cannot remember if it has ever been actually used.

Does that make sense?

jlpoolen commented 1 year ago

@samip5 , could you be more specific. "the script" is a reference to binfrmt.sh? "both" is reference to what?

Also, just for my case, running on an X86_64 platform trying to build for an aarch64, would there be any impact on my build is binfrmt.sh line 35 were rem'd out?

samip5 commented 1 year ago

@samip5 , could you be more specific. "the script" is a reference to binfrmt.sh? "both" is reference to what?

Also, just for my case, running on an X86_64 platform trying to build for an aarch64, would there be any impact on my build is binfrmt.sh line 35 were rem'd out?

Yes, the script is referring to binfmt.sh and both is reference to arm and aarch64 so line 35 needs to succeed for the script to succeed.

You should be able to just comment it out yes.