Open jlpoolen opened 1 year ago
make a pull request that modifies all instances of the keyword "trap" in the folder parsers/* to preserve the file on failure.
I wanted to run the command manually to see if I catch any STDERR messages that might be missed when in Jenkins. So, continuing in manual mode I accomplished the following. What is interesting is the:
raise KeyError(key) from None
KeyError: 'CCACHE_DIR'
Here's my test session:
Wednesday Jan 04, 2023 9:35:44 PM
Debugging failure.
Modified suspected file: rawcommand:
diff --git a/parsers/rawcommand/rawcommand b/parsers/rawcommand/rawcommand
index d48927e..f821d72 100755
--- a/parsers/rawcommand/rawcommand
+++ b/parsers/rawcommand/rawcommand
@@ -5,8 +5,11 @@ echo "Running $@"
function finish
{
- ret=$?
+ ret=$?
+ cp "${PROJECT_DIR}/chroot/em-$$" /tmp/rawcommand_em-$$
+ echo copied em-$$ to /tmp/rawcommand_em-$$
rm -f "${PROJECT_DIR}/chroot/em-$$"
+
exit $ret
}
trap finish EXIT
As root:
cd /home/jlpoole/local/Build.Dist
./build.sh
...
checking set_root_password deps ['news']
running jobs [(<Popen: returncode: None args: ['/home/jlpoole/local/Build.Dist/parsers/rawc...>, 'locale')]
+ source /home/jlpoole/local/Build.Dist/scripts/functions.sh
+ echo 'Running eselect locale set en_US.utf8'
Running eselect locale set en_US.utf8
+ trap finish EXIT
+ cat
+ chmod +x /home/jlpoole/local/Build.Dist/build/GenPi64OpenRC/chroot/em-7110
+ /home/jlpoole/local/Build.Dist/scripts/chroot.py /em-7110
Traceback (most recent call last):
File "/usr/lib/python3.10/site-packages/pychroot/scripts/pychroot.py", line 130, in main
File "/usr/lib/python3.10/subprocess.py", line 503, in run
with Popen(*popenargs, **kwargs) as process:
File "/usr/lib/python3.10/subprocess.py", line 971, in __init__
self._execute_child(args, executable, preexec_fn, close_fds,
File "/usr/lib/python3.10/subprocess.py", line 1847, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
OSError: [Errno 8] Exec format error: '/em-7110'
+ finish
+ ret=1
+ cp /home/jlpoole/local/Build.Dist/build/GenPi64OpenRC/chroot/em-7110 /tmp/rawcommand_em-7110
+ echo copied em-7110 to /tmp/rawcommand_em-7110
copied em-7110 to /tmp/rawcommand_em-7110
+ rm -f /home/jlpoole/local/Build.Dist/build/GenPi64OpenRC/chroot/em-7110
+ exit 1
FATAL: JOBFAILED locale
FATAL: JOBFAILED gentoo-base
run complete.
# copied preserved file back in and reran chroot.py command:
jenk /home/jlpoole/local/Build.Dist # cat /tmp/rawcommand_em-7110
#!/usr/bin/env bash
set -evx
source /etc/profile
eselect locale set en_US.utf8
jenk /home/jlpoole/local/Build.Dist # cp /tmp/rawcommand_em-7110 /home/jlpoole/local/Build.Dist/build/GenPi64OpenRC/chroot/em-7110
jenk /home/jlpoole/local/Build.Dist # /home/jlpoole/local/Build.Dist/scripts/chroot.py /em-7110
Traceback (most recent call last):
File "/home/jlpoole/local/Build.Dist/scripts/chroot.py", line 7, in <module>
if not os.path.exists(os.environ['CCACHE_DIR']):
File "/usr/lib/python-exec/python3.10/../../../lib/python3.10/os.py", line 680, in __getitem__
raise KeyError(key) from None
KeyError: 'CCACHE_DIR'
jenk /home/jlpoole/local/Build.Dist # ls -la /home/jlpoole/local/Build.Dist/build/GenPi64OpenRC/chroot/em-7110
-rwxr-xr-x 1 root root 111 Jan 4 21:35 /home/jlpoole/local/Build.Dist/build/GenPi64OpenRC/chroot/em-7110
jenk /home/jlpoole/local/Build.Dist # rm /home/jlpoole/local/Build.Dist/build/GenPi64OpenRC/chroot/em-7110
jenk /home/jlpoole/local/Build.Dist #
The exec format error means that something is wrong with how you have qemu setup.
Try rebooting.
Three items:
1) I compared the current local manual build's chroot tree with the successful one I built last January to see if there were any files or directories in /var/tmp. Both were empty. 2) when I ran the the project in Jenkins, there was a failure for a missing directory. I found all the "mkdir" commands and found a couple missing the "-p" and patched my copy on Windows and "committed", but did not push, so my commits never made it to the Jenkins instance. Nonetheless, when I reran Jenkins, the missing directory error did not occur, so something in the Jenkins build gets cached and manifest itself on the first run. Given that most of the mkdir commands have "-p" and there were a couple that did not, I bring this to your attention as a possible subtle bug that may only manifest itself on the first run. My inexperience with git's process flow (I'm used to Subversion) made me think if I "committed" on my Windows instance, my instance on GitHub would automatically get updated; I did not know about the requisite "push" step. Now I do.
3) my qemu install:
jenk /home/jlpoole/local/Build.Dist # date; eix -I qemu
Wed Jan 4 10:10:37 PM PST 2023
[I] app-emulation/qemu
Available versions: ~7.1.0^t 7.1.0-r2^t ~7.2.0^t **7.2.0-r1^t **9999*l^t {accessibility +aio alsa bpf bzip2 +caps capstone +curl debug (+)doc +fdt +filecaps fuse glusterfs +gnutls gtk infiniband io-uring iscsi jack jemalloc +jpeg lzo multipath ncurses nfs nls numa opengl +oss pam +pin-upstream-blobs plugins +png pulseaudio python rbd sasl sdl sdl-image +seccomp selinux +slirp smartcard snappy spice ssh static static-user systemtap test udev usb usbredir vde +vhost-net vhost-user-fs virgl virtfs +vnc vte xattr xen zstd PYTHON_TARGETS="python3_8 python3_9 python3_10 python3_11" QEMU_SOFTMMU_TARGETS="aarch64 alpha arm avr cris hppa i386 loongarch64 m68k microblaze microblazeel mips mips64 mips64el mipsel nios2 or1k ppc ppc64 riscv32 riscv64 rx s390x sh4 sh4eb sparc sparc64 tricore x86_64 xtensa xtensaeb" QEMU_USER_TARGETS="aarch64 aarch64_be alpha arm armeb cris hexagon hppa i386 loongarch64 m68k microblaze microblazeel mips mips64 mips64el mipsel mipsn32 mipsn32el nios2 or1k ppc ppc64 ppc64le riscv32 riscv64 s390x sh4 sh4eb sparc sparc64 sparc32plus x86_64 xtensa xtensaeb"}
Installed versions: 7.1.0-r2^t(09:41:03 PM 01/01/2023)(aio bzip2 curl fdt filecaps gnutls jpeg ncurses nls oss pam pin-upstream-blobs png slirp static-user vhost-net vnc xattr -accessibility -alsa -bpf -capstone -debug -doc -fuse -glusterfs -gtk -infiniband -io-uring -iscsi -jack -jemalloc -lzo -multipath -nfs -numa -opengl -plugins -pulseaudio -python -rbd -sasl -sdl -sdl-image -selinux -smartcard -snappy -spice -ssh -static -systemtap -test -udev -usb -usbredir -vde -virgl -virtfs -vte -xen -zstd PYTHON_TARGETS="python3_10 -python3_8 -python3_9 -python3_11" QEMU_SOFTMMU_TARGETS="aarch64 x86_64 -alpha -arm -avr -cris -hppa -i386 -loongarch64 -m68k -microblaze -microblazeel -mips -mips64 -mips64el -mipsel -nios2 -or1k -ppc -ppc64 -riscv32 -riscv64 -rx -s390x -sh4 -sh4eb -sparc -sparc64 -tricore -xtensa -xtensaeb" QEMU_USER_TARGETS="aarch64 x86_64 -aarch64_be -alpha -arm -armeb -cris -hexagon -hppa -i386 -loongarch64 -m68k -microblaze -microblazeel -mips -mips64 -mips64el -mipsel -mipsn32 -mipsn32el -nios2 -or1k -ppc -ppc64 -ppc64le -riscv32 -riscv64 -s390x -sh4 -sh4eb -sparc -sparc64 -sparc32plus -xtensa -xtensaeb")
Homepage: https://www.qemu.org https://www.linux-kvm.org
Description: QEMU + Kernel-based Virtual Machine userland tools
jenk /home/jlpoole/local/Build.Dist #
I'll proceed with a reboot now.
rebooted and got same result:
jenk /home/jlpoole/local/Build.Dist # date;./build.sh
Wed Jan 4 10:14:55 PM PST 2023
/home/jlpoole/local/Build.Dist/scripts/binfmt.sh: line 35: echo: write error: No such file or directory
loading manifest
...
checking set_root_password deps ['news']
running jobs [(<Popen: returncode: None args: ['/home/jlpoole/local/Build.Dist/parsers/rawc...>, 'locale')]
+ source /home/jlpoole/local/Build.Dist/scripts/functions.sh
+ echo 'Running eselect locale set en_US.utf8'
Running eselect locale set en_US.utf8
+ trap finish EXIT
+ cat
+ chmod +x /home/jlpoole/local/Build.Dist/build/GenPi64OpenRC/chroot/em-2122
+ /home/jlpoole/local/Build.Dist/scripts/chroot.py /em-2122
Traceback (most recent call last):
File "/usr/lib/python3.10/site-packages/pychroot/scripts/pychroot.py", line 130, in main
File "/usr/lib/python3.10/subprocess.py", line 503, in run
with Popen(*popenargs, **kwargs) as process:
File "/usr/lib/python3.10/subprocess.py", line 971, in __init__
self._execute_child(args, executable, preexec_fn, close_fds,
File "/usr/lib/python3.10/subprocess.py", line 1847, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
OSError: [Errno 8] Exec format error: '/em-2122'
+ finish
+ ret=1
+ cp /home/jlpoole/local/Build.Dist/build/GenPi64OpenRC/chroot/em-2122 /tmp/rawcommand_em-2122
+ echo copied em-2122 to /tmp/rawcommand_em-2122
copied em-2122 to /tmp/rawcommand_em-2122
+ rm -f /home/jlpoole/local/Build.Dist/build/GenPi64OpenRC/chroot/em-2122
+ exit 1
FATAL: JOBFAILED locale
FATAL: JOBFAILED gentoo-base
run complete.
jenk /home/jlpoole/local/Build.Dist # git status
On branch main
Your branch is up to date with 'origin/main'.
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: parsers/rawcommand/rawcommand
Untracked files:
(use "git add <file>..." to include in what will be committed)
parsers/rawcommand/rawcommand~
no changes added to commit (use "git add" and/or "git commit -a")
jenk /home/jlpoole/local/Build.Dist # cp /tmp/rawcommand_em-2122 /home/jlpoole/local/Build.Dist/build/GenPi64OpenRC/chroot
jenk /home/jlpoole/local/Build.Dist # /home/jlpoole/local/Build.Dist/scripts/chroot.py /em-2122
Traceback (most recent call last):
File "/home/jlpoole/local/Build.Dist/scripts/chroot.py", line 7, in <module>
if not os.path.exists(os.environ['CCACHE_DIR']):
File "/usr/lib/python-exec/python3.10/../../../lib/python3.10/os.py", line 680, in __getitem__
raise KeyError(key) from None
KeyError: 'CCACHE_DIR'
jenk /home/jlpoole/local/Build.Dist # ls /home/jlpoole/local/Build.Dist/build/GenPi64OpenRC/chroot
bin em-config home media proc run tmp
boot em-config.json lib mnt rawcommand_em-2122 sbin usr
dev etc lib64 opt root sys var
jenk /home/jlpoole/local/Build.Dist # mv /home/jlpoole/local/Build.Dist/build/GenPi64OpenRC/chroot/rawcommand_em-2122 /home/jlpoole/local/Build.Dist/build/GenPi64OpenRC/chroot/em-2122
jenk /home/jlpoole/local/Build.Dist # /home/jlpoole/local/Build.Dist/scripts/chroot.py /em-2122
Traceback (most recent call last):
File "/home/jlpoole/local/Build.Dist/scripts/chroot.py", line 7, in <module>
if not os.path.exists(os.environ['CCACHE_DIR']):
File "/usr/lib/python-exec/python3.10/../../../lib/python3.10/os.py", line 680, in __getitem__
raise KeyError(key) from None
KeyError: 'CCACHE_DIR'
jenk /home/jlpoole/local/Build.Dist # rm /home/jlpoole/local/Build.Dist/build/GenPi64OpenRC/chroot/em-2122
jenk /home/jlpoole/local/Build.Dist #
Please see /home/jlpoole/local/Build.Dist/scripts/binfmt.sh: line 35: echo: write error: No such file or directory
at the start, so your qemu is the part of the issue.
Please try to rerun only that script manually.
What https://github.com/GenPi64/Build.Dist/issues/180#issuecomment-1372113199 is referring to is at the outset of my run of buid.sh, specifically line 3 below [I added line line numbers]:
1 jenk /home/jlpoole/local/Build.Dist # date;./build.sh
2 Wed Jan 4 10:14:55 PM PST 2023
3 /home/jlpoole/local/Build.Dist/scripts/binfmt.sh: line 35: echo: write error: No such file or directory
4 loading manifest
5 load complete
6 Loading status
7 status not loadable
8 load complete
...
The "write error" message is generated by line 35 of binfmt.sh which currently has [I added line line numbers]:
33 if [ "${cpu}" != "arm" ] ; then
34 if [[ ! -e /proc/sys/fs/binfmt_misc/arm ]]; then
35 echo ":arm:M::\x7fELF\x01\x01\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x28\x00:\xff\xff\xff\xff\xff\xff\xff\x00\xff\xff\xff\xff\xff\xff\xff\xff\xfe\xff\xff\xff:/usr/bin/qemu-arm:F" > /proc/sys/fs/binfmt_misc/register
36 fi
37 fi
The variable $cpu is set earlier in binfmt.sh from a system call to "uname -m". So, I'm guessing that Qemu causes "uname -m" to produce something that matches "armv[4-9]*" thereby triggering the "if" clause at binfmt.sh@33. So the failure at line 35 is the echo command populating /proc/sys/fs/binfmt_misc/register:
echo ":arm:M::\x7fELF\x01\x01\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x28\x00:\xff\xff\xff\xff\xff\xff\xff\x00\xff\xff\xff\xff\xff\xff\xff\xff\xfe\xff\xff\xff:/usr/bin/qemu-arm:F" > /proc/sys/fs/binfmt_misc/register
The last line, #40, of env.sh, invokes binfmt.sh:
[40] ${SCRIPTS}/binfmt.sh"
env.sh, in turn, is called by build.sh at line 14:
[14] source ${BASEDIR}/env.sh
Note: immediately following line 14 in build.sh are the following:
23 if [[ ! -d "$PROJECT_DIR" ]]; then
24 if [[ ! -e "$PROJECT_DIR" ]]; then
25 if [ ! -z $BTRFS_SNAPSHOTS ]; then
26 btrfs subvolume create "$PROJECT_DIR"
27 fi
28 fi
29 fi
I recall from my previous successful build experience in January 2022, performed in a VM, that I had to manually load a module for btrfs since it was not built into my kernel, there was a requirement the btrfs be working.
The lynchpin here looks to be the binfmt.sh@35 failure that is reported out, but does not halt the process. I will try experimenting with the "btrfs subvolume create" and determine what the current system is on my ...chroot. It seems that if echo line fails, everything should halt. I'll update further as I learn more. Thank you, @samip5 for catching this.
The if clause is looking for "arm" under /proc/sys/fs/binfmt_misc/
jenk /home/jlpoole # ls /proc/sys/fs/binfmt_misc/
aarch64 register status
jenk /home/jlpoole #
Since "arm" does not exist, Try duplicating the command
jenk /home/jlpoole # echo ":arm:M::\x7fELF\x01\x01\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x28\x00:\xff\xff\xff\xff\xff\xff\xff\x00\xff\xff\xff\xff\xff\xff\xff\xff\xfe\xff\xff\xff:/usr/bin/qemu-arm:F" > /proc/sys/fs/binfmt_misc/register
bash: echo: write error: No such file or directory
jenk /home/jlpoole #
"register" is present, to the "No such file..." message may be referring to /usr/bin/qemu-arm? Note: above has "/usr/bin/qemu-arm:F", yet /usr/bin has no "qemu-arm":
jenk /home/jlpoole # ls /usr/bin/qemu*
/usr/bin/qemu-aarch64 /usr/bin/qemu-pr-helper
/usr/bin/qemu-edid /usr/bin/qemu-storage-daemon
/usr/bin/qemu-img /usr/bin/qemu-system-aarch64
/usr/bin/qemu-io /usr/bin/qemu-system-x86_64
/usr/bin/qemu-nbd /usr/bin/qemu-x86_64
jenk /home/jlpoole #
Has there been a name change from qemu-arm to qemu-aarch64?
https://codepyre.com/2019/12/arming-yourself/ under "Configuration" notes a difference:
# Create a configuration for arm32v7
...
# Create a configuration for arm64v8
I'm wondering if the reference to qemu-arm is something that needs to be replaced with qemu-aarch64?
I'm in an area (qemu) where I know practically nothing, other than arm has been developing and the fact that my qemu installation has no qemu-arm, yet has a qemu-aarch64 has me wondering if the problem I face is a result of progress and deprecation.
You don't want "arm". That's the 32bit version of the arm architecture. You want aarch64. The " arm" check is only there for completeness sake, if someone decides they really want to bother with 32 bit.
You don't want "arm". That's the 32bit version of the arm architecture. You want aarch64. The " arm" check is only there for completeness sake, if someone decides they really want to bother with 32 bit.
So the execution of binfrmt.sh@35 is for completeness sake and if it errors outs, then the ensuing error message "/home/jlpoole/local/Build.Dist/scripts/binfmt.sh: line 35: echo: write error: No such file or directory" is an expected and/or tolerated error message?
You don't want "arm". That's the 32bit version of the arm architecture. You want aarch64. The " arm" check is only there for completeness sake, if someone decides they really want to bother with 32 bit.
So the execution of binfrmt.sh@35 is for completeness sake and if it errors outs, then the ensuing error message "/home/jlpoole/local/Build.Dist/scripts/binfmt.sh: line 35: echo: write error: No such file or directory" is an expected and/or tolerated error message?
That means that the writing to the binfmt registry (used to register the qemu instance for arm64 binaries) failed and you should look into why it failed as that's why qemu is not working for you.
I'm confused. @jonesmz says I should not want the arm32 architecture (I agree) and I'm presuming the binfrmt.sh@35 line is for that purpose, yet @samip5 says that the failure at line 35 to write a stream or characters which includes a reference to "arm" to the registry is the essence of my problem. I was thinking the stream needs to be modified for aarch64. What's the point of writing to a registry with "arm" when you are building for "aarch64"? Or am I missing something here?
I'm confused. @jonesmz says I should not want the arm32 architecture (I agree) and I'm presuming the binfrmt.sh@35 line is for that purpose, yet @samip5 says that the failure at line 35 to write a stream or characters which includes a reference to "arm" to the registry is the essence of my problem. I was thinking the stream needs to be modified for aarch64. What's the point of writing to a registry with "arm" when you are building for "aarch64"? Or am I missing something here?
The issue with the script is that both should be successful as both will be tried to enable if CPU is not aarch64 nor arm but something else. The point of arm registry is to enable support for arm as our builder has support for that too altough cannot remember if it has ever been actually used.
Does that make sense?
@samip5 , could you be more specific. "the script" is a reference to binfrmt.sh? "both" is reference to what?
Also, just for my case, running on an X86_64 platform trying to build for an aarch64, would there be any impact on my build is binfrmt.sh line 35 were rem'd out?
@samip5 , could you be more specific. "the script" is a reference to binfrmt.sh? "both" is reference to what?
Also, just for my case, running on an X86_64 platform trying to build for an aarch64, would there be any impact on my build is binfrmt.sh line 35 were rem'd out?
Yes, the script is referring to binfmt.sh
and both is reference to arm and aarch64 so line 35 needs to succeed for the script to succeed.
You should be able to just comment it out yes.
Enhancement Request During failed Build (not running under Jenkins, but in a console as root), a clean-up is performed which removes a temporary file of interest. For example, from Issue #179:
The file of interest is em-5481, but a higher level handler removes the file.
the "+ ret=1" suggests that the handler which then executes the "rm -f..." would know the script is in error mode and likewise might do something that makes the file, em-5481, available. I'm thinking maybe cp the file to /tmp and print out a status report of doing so. I'll take a look and the code and see if what I'm thinking of is possible. I thought filing an enhancement request might highlight this type of approach. If it can be done, I'll do it and see if I can preserve the temporary file for forensic analysis on a future failure.