Rudd-O / zfs-fedora-installer

Fedora on ZFS root installer
35 stars 6 forks source link

deploy_zfs: rpmbuild command corrupts rpm DB #26

Open vista- opened 6 years ago

vista- commented 6 years ago

I am trying to use the deploy_zfs on a freshly-installed minimal netinst Fedora 28 install, but even after fixing the dependency errors, I have ran into this problem: rpmbuild always seems to segfault, and after the segfault, the entire RPM database is corrupted.

See log below:

Installing zfs-dkms-0.7.0-1.fc28.src.rpm
/bin/sh: line 13: 30292 Bus error               (core dumped) LANG=C rpmbuild --define "_tmppath $rpmbuild/TMP" --define "_topdir $rpmbuild" --define "_without_debug 1" --define "_without_debug_kmem 1" --define "_without_debug_kmem_tracking 1" --define "_without_debuginfo 1" --define "_without_asan 1" --rebuild $rpmpkg
make[1]: *** [Makefile:1272: rpm-common] Error 1
make[1]: Leaving directory '/usr/src/zfs'
make: *** [Makefile:1227: rpm-dkms] Error 2
   ERROR:          root:          deploy_zfs@1227   Unexpected error
Traceback (most recent call last):
  File "./src/installfedoraonzfs/__init__.py", line 1225, in deploy_zfs
    to_unmount=to_unmount,)
  File "./src/installfedoraonzfs/__init__.py", line 1182, in deploy_zfs_in_machine
    check_call(cmd)
  File "./src/installfedoraonzfs/cmd.py", line 28, in check_call
    return subprocess.check_call(*args,**kwargs)
  File "/usr/lib64/python2.7/subprocess.py", line 190, in check_call
    raise CalledProcessError(retcode, cmd)
CalledProcessError: Command '['bash', '-c', 'cd /usr/src/zfs && ./autogen.sh && ./configure --with-config=user && make -j4 rpm-utils && make -j4 rpm-dkms']' returned non-zero exit status 2
    INFO:          root:          deploy_zfs@1229   Cleaning up now
Traceback (most recent call last):
  File "./deploy-zfs", line 10, in <module>
    sys.exit(installfedoraonzfs.deploy_zfs())
  File "./src/installfedoraonzfs/__init__.py", line 1225, in deploy_zfs
    to_unmount=to_unmount,)
  File "./src/installfedoraonzfs/__init__.py", line 1182, in deploy_zfs_in_machine
    check_call(cmd)
  File "./src/installfedoraonzfs/cmd.py", line 28, in check_call
    return subprocess.check_call(*args,**kwargs)
  File "/usr/lib64/python2.7/subprocess.py", line 190, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['bash', '-c', 'cd /usr/src/zfs && ./autogen.sh && ./configure --with-config=user && make -j4 rpm-utils && make -j4 rpm-dkms']' returned non-zero exit status 2

The RPM DB is hosed after this:

[root@localhost]~# dnf
error: rpmdb: BDB0113 Thread/process 30292/140512981670080 failed: BDB1507 Thread died in Berkeley DB library
error: db5 error(-30973) from dbenv->failchk: BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery
error: cannot open Packages index using db5 -  (-30973)
error: cannot open Packages database in /var/lib/rpm
Error: Error: rpmdb open failed

I am able to reproduce this issue.

Rudd-O commented 6 years ago

Can you reproduce on different hardware?

Rudd-O commented 6 years ago

Also please list host Fedora version and RPM packages version, vs version of Fedora you are attempting to install.

vista- commented 6 years ago

I don't think this problem is hardware-related - I could reproduce this on both the host, running Fedora 27 Workstation (HP laptop) and a fresh install of Fedora 28 Minimal (KVM VM). Both are running RPM version 4.14.1 with 4.17.6 kernel.

vista- commented 6 years ago

Even the ZOL's ZFS repository is exhibiting this problem - I'll bring up the issue there. What patches are present in your repository that are not in the mainline repo?

vista- commented 6 years ago

See issue https://github.com/zfsonlinux/zfs/issues/7727 for updates.

vista- commented 6 years ago

I see that you have worked on this script meanwhile - have you perhaps managed to insert a workaround for this issue yet?

Rudd-O commented 6 years ago

No workaround, regrettably. It still happens on my machine.

vista- commented 6 years ago

Might I suggest adding a knob for installing ZFS / ZFS tools from the ZoL Fedora repos? I was able to work around this issue by installing the packages from there.

Rudd-O commented 6 years ago

Would you mind sending a PR with that code?  I would like this feature too, but I'm swamped with work.  In the meantime, you can use --with-prebuilt-rpms=path/to/downloaded/RPMs to continue using this tool with the upstream RPMs.

Rudd-O commented 5 years ago

My workaround has been to retry under certain circumstances. Latest code does this.

Rudd-O commented 5 years ago

This keeps happening, even on Fedora 29, and even on non-parallel, single builds. While the binary RPM rpmbuild is valid, the process still coredumps (when it doesn't hang). This is running in a chroot on top of a physical-device ZFS pool:

...
...
...
mangling shebang in /usr/src/zfs-0.8.0/cmd/zed/zed.d/data-notify.sh from /bin/sh to #!/usr/bin/sh
mangling shebang in /usr/src/zfs-0.8.0/cmd/zed/zed.d/scrub_finish-notify.sh from /bin/sh to #!/usr/bin/sh
mangling shebang in /usr/src/zfs-0.8.0/cmd/zed/zed.d/pool_import-led.sh from /bin/sh to #!/usr/bin/sh
mangling shebang in /usr/src/zfs-0.8.0/cmd/zed/zed.d/all-syslog.sh from /bin/sh to #!/usr/bin/sh
mangling shebang in /usr/src/zfs-0.8.0/cmd/zgenhostid/zgenhostid from /bin/bash to #!/usr/bin/bash
mangling shebang in /usr/src/zfs-0.8.0/cmd/vdev_id/vdev_id from /bin/sh to #!/usr/bin/sh
mangling shebang in /usr/src/zfs-0.8.0/cmd/fsck_zfs/fsck.zfs from /bin/sh to #!/usr/bin/sh
Processing files: zfs-dkms-0.8.0-rc3.fc29.noarch
Provides: zfs-dkms = 0.8.0-rc3.fc29 zfs-kmod = 0.8.0
Requires(interp): /bin/sh /bin/sh
Requires(rpmlib): rpmlib(CompressedFileNames) <= 3.0.4-1 rpmlib(FileDigests) <= 4.6.0-1 rpmlib(PartialHardlinkSets) <= 4.0.4-1 rpmlib(PayloadFilesHavePrefix) <= 4.0-1
Requires(post): /bin/sh
Requires(preun): /bin/sh
Obsoletes: spl-dkms
Checking for unpackaged file(s): /usr/lib/rpm/check-files /root/rpmbuild/BUILDROOT/zfs-dkms-0.8.0-rc3.fc29.x86_64
Wrote: /root/rpmbuild/RPMS/noarch/zfs-dkms-0.8.0-rc3.fc29.noarch.rpm
Executing(%clean): /bin/sh -e /var/tmp/rpm-tmp.0na9Ub
+ umask 022
+ cd /root/rpmbuild/BUILD
+ cd zfs-0.8.0
+ '[' /root/rpmbuild/BUILDROOT/zfs-dkms-0.8.0-rc3.fc29.x86_64 '!=' / ']'
+ rm -rf /root/rpmbuild/BUILDROOT/zfs-dkms-0.8.0-rc3.fc29.x86_64
+ exit 0
Executing(--clean): /bin/sh -e /var/tmp/rpm-tmp.6UxGii
+ umask 022
+ cd /root/rpmbuild/BUILD
+ rm -rf zfs-0.8.0
+ exit 0
Segmentation fault (core dumped)
# command used
# [root@roxanne zfs]# rpmbuild --rebuild zfs-dkms-0.8.0-rc3.fc29.src.rpm 

This is impacting all my desktop machines and my Amazon EC2 build servers.

I have a coredump too:

Mar 21 07:25:41 roxanne.dragonfear systemd-coredump[27522]: Process 13082 (rpmbuild) of user 0 dumped core.

                                                            Stack trace of thread 13082:
                                                            #0  0x00007fe8eb03e6ed __lock_freelocker_int.isra.0 (libdb-5.3.so)
                                                            #1  0x00007fe8eb03e82a __lock_id_free (libdb-5.3.so)
                                                            #2  0x00007fe8eb04caa1 __db_refresh (libdb-5.3.so)
                                                            #3  0x00007fe8eb04cfda __db_close (libdb-5.3.so)
                                                            #4  0x00007fe8eb05ec6d __db_close_pp (libdb-5.3.so)
                                                            #5  0x00007fe8eb20054e n/a (librpm.so.8)
                                                            #6  0x00007fe8eb20a349 n/a (librpm.so.8)
                                                            #7  0x00007fe8eb2365fd rpmtsCloseDB (librpm.so.8)
                                                            #8  0x00007fe8eb2374fa rpmtsFree (librpm.so.8)
                                                            #9  0x000055958c1abac1 n/a (rpmbuild)
                                                            #10 0x00007fe8ea7d0413 __libc_start_main (libc.so.6)
                                                            #11 0x000055958c1abe0e n/a (rpmbuild)
[root@roxanne ~]# coredumpctl dump 
           PID: 13082 (rpmbuild)
           UID: 0 (root)
           GID: 0 (root)
        Signal: 11 (SEGV)
     Timestamp: Thu 2019-03-21 07:25:40 UTC (3min 56s ago)
  Command Line: rpmbuild --rebuild zfs-dkms-0.8.0-rc3.fc29.src.rpm
    Executable: /var/lib/zfs-fedora-installer/rescuebox/usr/bin/rpmbuild
 Control Group: /user.slice/user-0.slice/session-56.scope
          Unit: session-56.scope
         Slice: user-0.slice
       Session: 56
     Owner UID: 0 (root)
       Boot ID: 429cc7736c8243288e8875540dbefa64
    Machine ID: 7a503053459b45018c32faabdc8f8624
      Hostname: roxanne.dragonfear
       Storage: /var/lib/systemd/coredump/core.rpmbuild.0.429cc7736c8243288e8875540dbefa64.13082.1553153140000000.lz4
       Message: Process 13082 (rpmbuild) of user 0 dumped core.

                Stack trace of thread 13082:
                #0  0x00007fe8eb03e6ed __lock_freelocker_int.isra.0 (libdb-5.3.so)
                #1  0x00007fe8eb03e82a __lock_id_free (libdb-5.3.so)
                #2  0x00007fe8eb04caa1 __db_refresh (libdb-5.3.so)
                #3  0x00007fe8eb04cfda __db_close (libdb-5.3.so)
                #4  0x00007fe8eb05ec6d __db_close_pp (libdb-5.3.so)
                #5  0x00007fe8eb20054e n/a (librpm.so.8)
                #6  0x00007fe8eb20a349 n/a (librpm.so.8)
                #7  0x00007fe8eb2365fd rpmtsCloseDB (librpm.so.8)
                #8  0x00007fe8eb2374fa rpmtsFree (librpm.so.8)
                #9  0x000055958c1abac1 n/a (rpmbuild)
                #10 0x00007fe8ea7d0413 __libc_start_main (libc.so.6)
                #11 0x000055958c1abe0e n/a (rpmbuild)
Refusing to dump core to tty (use shell redirection or specify --output).

I can confirm that the RPM database is hopelessly corrupted after the crash Only rpmdb --rebuilddb fixes it. Why would rpmbuild need to open the database for writes?

[root@roxanne zfs]# rpm -Uvh libnvpair1-0.8.0-rc3.fc29.x86_64.rpm libuutil1-0.8.0-rc3.fc29.x86_64.rpm libzfs2-0.8.0-rc3.fc29.x86_64.rpm libzpool2-0.8.0-rc3.fc29.x86_64.rpm zfs-0.8.0-rc3.fc29.x86_64.rpm zfs-dkms-0.8.0-rc3.fc29.noarch.rpm zfs-dracut-0.8.0-rc3.fc29.x86_64.rpm 
error: rpmdb: BDB0113 Thread/process 13082/140638343002304 failed: BDB1507 Thread died in Berkeley DB library
error: db5 error(-30973) from dbenv->failchk: BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery
error: cannot open Packages index using db5 -  (-30973)
error: cannot open Packages database in /var/lib/rpm
error: rpmdb: BDB0113 Thread/process 13082/140638343002304 failed: BDB1507 Thread died in Berkeley DB library
error: db5 error(-30973) from dbenv->failchk: BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery
error: cannot open Packages index using db5 -  (-30973)
error: cannot open Packages database in /var/lib/rpm

More complete backtrace:

GNU gdb (GDB) Fedora 8.2-6.fc29
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /var/lib/zfs-fedora-installer/rescuebox/usr/bin/rpmbuild...Reading symbols from /usr/lib/debug/usr/bin/rpmbuild-4.14.2.1-2.fc29.x86_64.debug...done.
done.
[New LWP 13082]

warning: Loadable section ".note.gnu.property" outside of ELF segments

warning: Loadable section ".note.gnu.property" outside of ELF segments

warning: Loadable section ".note.gnu.property" outside of ELF segments

warning: Loadable section ".note.gnu.property" outside of ELF segments

warning: Loadable section ".note.gnu.property" outside of ELF segments

warning: Loadable section ".note.gnu.property" outside of ELF segments

warning: Loadable section ".note.gnu.property" outside of ELF segments

warning: Loadable section ".note.gnu.property" outside of ELF segments

warning: Loadable section ".note.gnu.property" outside of ELF segments

warning: Loadable section ".note.gnu.property" outside of ELF segments

warning: Loadable section ".note.gnu.property" outside of ELF segments

warning: Loadable section ".note.gnu.property" outside of ELF segments

warning: Loadable section ".note.gnu.property" outside of ELF segments

warning: Loadable section ".note.gnu.property" outside of ELF segments
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".

warning: Loadable section ".note.gnu.property" outside of ELF segments

warning: Loadable section ".note.gnu.property" outside of ELF segments
Core was generated by `rpmbuild --rebuild zfs-dkms-0.8.0-rc3.fc29.src.rpm'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007fe8eb03e6ed in __lock_freelocker_int (region=region@entry=0x7fe8dd6db6b0, sh_locker=sh_locker@entry=0x7fe8dd70d998, reallyfree=reallyfree@entry=1, lt=<optimized out>, lt=<optimized out>)
    at ../../src/lock/lock_id.c:507
507         SH_TAILQ_REMOVE(&region->lockers, sh_locker,
Missing separate debuginfos, use: dnf debuginfo-install openssl-libs-1.1.1b-2.fc29.x86_64
(gdb) bt
#0  0x00007fe8eb03e6ed in __lock_freelocker_int (region=region@entry=0x7fe8dd6db6b0, sh_locker=sh_locker@entry=0x7fe8dd70d998, reallyfree=reallyfree@entry=1, lt=<optimized out>, lt=<optimized out>)
    at ../../src/lock/lock_id.c:507
#1  0x00007fe8eb03e82a in __lock_id_free (env=env@entry=0x55958d7fe210, sh_locker=0x7fe8dd70d998) at ../../src/lock/lock_id.c:213
#2  0x00007fe8eb04caa1 in __db_refresh (dbp=dbp@entry=0x55958d80f1c0, txn=txn@entry=0x0, flags=flags@entry=0, deferred_closep=deferred_closep@entry=0x7ffce14e2bc4, reuse=reuse@entry=0) at ../../src/db/db.c:1097
#3  0x00007fe8eb04cfda in __db_close (dbp=dbp@entry=0x55958d80f1c0, txn=txn@entry=0x0, flags=flags@entry=0) at ../../src/db/db.c:747
#4  0x00007fe8eb05ec6d in __db_close_pp (dbp=0x55958d80f1c0, flags=0) at ../../src/db/db_iface.c:261
#5  0x00007fe8eb20054e in db3_dbiClose (dbi=0x55958d825470, flags=0) at backend/db3.c:746
#6  0x00007fe8eb20a349 in dbiForeach (del=1, func=<optimized out>, ndbi=<optimized out>, dbis=0x55958d7fe170) at rpmdb.c:378
#7  rpmdbClose (db=0x55958d7fdfb0) at rpmdb.c:411
#8  0x00007fe8eb20a78e in rpmdbClose (db=<optimized out>) at rpmdb.c:391
#9  0x00007fe8eb2365fd in rpmtsCloseDB (ts=ts@entry=0x55958d7fc9d0) at rpmts.c:79
#10 0x00007fe8eb2374fa in rpmtsFree (ts=0x55958d7fc9d0) at rpmts.c:733
#11 0x000055958c1abac1 in main (argc=<optimized out>, argv=<optimized out>) at rpmbuild.c:705
Rudd-O commented 5 years ago

I can confirm this happens even on a F27 system whose root file system is ext4, and can be reproduced simply by running ./deploy-zfs -- it isn't even necessary to run ./install-fedora-on-zfs to experience this bug.

Rudd-O commented 5 years ago

I don't see the following as a fix for deploy_zfs having trouble, but at least the image creator can use this.

Here's what I've seen mock do right before a build -- it nukes part of the RPM DB before attempting package installs. Here is a log:

[29] root.log: DEBUG buildroot.py:369:  removing 3 rpm db files
[29] root.log: DEBUG buildroot.py:374:  nuke_rpm_db: removing /home/user/.mock/fedora-29-x86_64-generic/root/var/lib/rpm/__db.003
[29] root.log: DEBUG buildroot.py:374:  nuke_rpm_db: removing /home/user/.mock/fedora-29-x86_64-generic/root/var/lib/rpm/__db.001
[29] root.log: DEBUG buildroot.py:374:  nuke_rpm_db: removing /home/user/.mock/fedora-29-x86_64-generic/root/var/lib/rpm/__db.002

I can try and trace the code for this junk, and replicate the behavior prior to every package install inside of the chroot. With enough luck, this should completely resolve the problem.