Closed YaLTeR closed 1 year ago
Do you have a coredump from this? Try using coredumpctl
and coredumpctl info
to try to extract a stack trace.
I think there's some issue with how rpm-ostree is built because there's nothing in the stack trace:
#0 0x00007f87f7c6cb94 __pthread_kill_implementation (libc.so.6 + 0x8eb94)
#1 0x00007f87f7c1baee raise (libc.so.6 + 0x3daee)
#2 0x00007f87f7c0487f abort (libc.so.6 + 0x2687f)
#3 0x0000558a579cc927 _ZN11panic_abort18__rust_start_panic5abort17h6ac263e9d2a6eb7eE (rpm-ostree + 0x985927)
#4 0x00007f87e4021b30 n/a (n/a + 0x0)
I also tried reproducing with RUST_BACKTRACE=1
and that's empty too. Looks like the debug symbols are stripped but not available in any other package.
┌ ~
└─ env RUST_BACKTRACE=full rpm-ostree update
⠤ Receiving metadata objects: 1/(estimating) 49 bytes/s 196 bytes 2 metadata, 0 content objects fetched; 788 B transferred in 5 seconds; 0 bytes content written
Receiving metadata objects: 1/(estimating) 49 bytes/s 196 bytes... done
⠒ Checking out tree 0e96fdf... Enabled rpm-md repositories: fedora-cisco-openh264 rawhide-modular rawhide rawhide-debuginfo rawhide-source fedora-rawhide-nodebug
⠐ Checking out tree 0e96fdf... thread 'tokio-runtime-worker' panicked at 'assertion failed: !(self.ptype == ProgressType::Task)', rust/src/console_progress.rs:124:9
stack backtrace:
0: 0x55adf3155b4a - <unknown>
1: 0x55adf317ef8e - <unknown>
2: 0x55adf314e485 - <unknown>
3: 0x55adf3155915 - <unknown>
4: 0x55adf315705f - <unknown>
5: 0x55adf3156d9b - <unknown>
6: 0x55adf315776c - <unknown>
7: 0x55adf31574c2 - <unknown>
8: 0x55adf3155ffc - <unknown>
9: 0x55adf3157212 - <unknown>
10: 0x55adf2952b83 - <unknown>
11: 0x55adf2952c5d - <unknown>
12: 0x55adf2be0665 - <unknown>
13: 0x55adf2a7875f - <unknown>
⠐ Checking out tree 0e96fdf... 14: 0x7fe1084673a2 - g_closure_invoke
at /usr/src/debug/glib2-2.74.1-3.fc38.x86_64/redhat-linux-build/../gobject/gclosure.c:832:7
15: 0x7fe108495f58 - signal_emit_unlocked_R.isra.0
at /usr/src/debug/glib2-2.74.1-3.fc38.x86_64/redhat-linux-build/../gobject/gsignal.c:3796:8
16: 0x7fe1084855ea - g_signal_emit_valist
at /usr/src/debug/glib2-2.74.1-3.fc38.x86_64/redhat-linux-build/../gobject/gsignal.c:3549:5
17: 0x7fe108485813 - g_signal_emit
at /usr/src/debug/glib2-2.74.1-3.fc38.x86_64/redhat-linux-build/../gobject/gsignal.c:3606:3
⠒ Checking out tree 0e96fdf... 18: 0x7fe107f999eb - on_signal_received
at /usr/src/debug/glib2-2.74.1-3.fc38.x86_64/redhat-linux-build/../gio/gdbusproxy.c:897:3
19: 0x7fe107f88070 - emit_signal_instance_in_idle_cb
at /usr/src/debug/glib2-2.74.1-3.fc38.x86_64/redhat-linux-build/../gio/gdbusconnection.c:3791:5
⠓ Checking out tree 0e96fdf... 20: 0x7fe107d9b49d - g_idle_dispatch
at /usr/src/debug/glib2-2.74.1-3.fc38.x86_64/redhat-linux-build/../glib/gmain.c:6124:15
21: 0x7fe107d9c540 - g_main_dispatch
at /usr/src/debug/glib2-2.74.1-3.fc38.x86_64/redhat-linux-build/../glib/gmain.c:3444:28
22: 0x7fe107d9c540 - g_main_context_dispatch
at /usr/src/debug/glib2-2.74.1-3.fc38.x86_64/redhat-linux-build/../glib/gmain.c:4162:7
23: 0x7fe107df44e8 - g_main_context_iterate.isra.0
at /usr/src/debug/glib2-2.74.1-3.fc38.x86_64/redhat-linux-build/../glib/gmain.c:4238:5
24: 0x7fe107d9baef - g_main_loop_run
at /usr/src/debug/glib2-2.74.1-3.fc38.x86_64/redhat-linux-build/../glib/gmain.c:4438:5
25: 0x55adf2a76868 - <unknown>
26: 0x55adf2a76a64 - <unknown>
27: 0x55adf2a69cf5 - <unknown>
28: 0x55adf2a6dae6 - <unknown>
29: 0x55adf2a675a4 - <unknown>
30: 0x55adf2bf554c - <unknown>
31: 0x55adf2a1d216 - <unknown>
32: 0x55adf2a63892 - <unknown>
33: 0x55adf30b1189 - <unknown>
34: 0x55adf30a16eb - <unknown>
35: 0x55adf30af09a - <unknown>
36: 0x55adf315daf3 - <unknown>
37: 0x7fe10713bc57 - start_thread
38: 0x7fe1071c1a70 - __clone3
39: 0x0 - <unknown>
fish: Job 1, 'env RUST_BACKTRACE=full rpm-ost…' terminated by signal SIGABRT (Abort)
Random (maybe helpful) report: Happened to me too. Happened for the first time today when removing the layered gcc package (could that do something?) Will now keep happening every time I do something with the ostree (install/uninstall/rebase), but only the first time. if I run the command a second time it works.
Error happens on fedora silverblue 38 but not on 37
It's been happening to me randomly like ⅓ of the time, I think this is just a race condition of some kind. Still happens on this version btw:
rpm-ostree:
Version: '2023.2'
Git: 8e81b02cfc3dabcda9af7d4275200efedbe145c7
Features:
- rust
- compose
- container
- fedora-integration
Sorry about the delay on this. I put up https://github.com/coreos/rpm-ostree/pull/4348 which should help us debug. Once that PR lands a build should show up in https://copr.fedorainfracloud.org/coprs/g/CoreOS/continuous/
Then env G_MESSAGES_DEBUG=rpm-ostree,rpm-ostreed RUST_LOG=rpm_ostree=debug,rpmostree_rust=debug rpm-ostree rebase fedora:fedora/x86_64/coreos/next
should get us a bit more debugging information. Also this is a client-server setup, so we may also need to do e.g.
$ systemctl edit --runtime rpm-ostreed
Then paste in there:
[Service]
Environment=G_MESSAGES_DEBUG=rpm-ostreed
Environment=RUST_LOG=rpm_ostree=debug,rpmostree_rust=debug
And systemctl restart rpm-ostreed
.
Ok @YaLTeR the debug patch is available in https://copr.fedorainfracloud.org/coprs/g/CoreOS/continuous/ - let me know if you need help trying out enabling that build alongside injecting the debug configuration above!
Also cc @istvan-derda since you hit this too
Got a new one with the debug logging:
⠠ Importing rpm-md 75% [███████████████░░░░░] (0s) (rpm-ostree update:4686): rpm-ostreed-DEBUG: 08:32:52.724: txn progress PercentProgress
⠤ Importing rpm-md 80% [████████████████░░░░] (0s) (rpm-ostree update:4686): rpm-ostreed-DEBUG: 08:32:52.918: txn progress PercentProgress
⠦ Importing rpm-md 85% [█████████████████░░░] (0s) (rpm-ostree update:4686): rpm-ostreed-DEBUG: 08:32:52.964: txn progress PercentProgress
⠖ Importing rpm-md 90% [██████████████████░░] (0s) (rpm-ostree update:4686): rpm-ostreed-DEBUG: 08:32:53.041: txn progress TaskBegin
thread 'tokio-runtime-worker' panicked at 'Overwriting task: "Importing rpm-md"', rust/src/console_progress.rs:179:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
fish: Job 1, 'env G_MESSAGES_DEBUG=rpm-ostree…' terminated by signal SIGABRT (Abort)
OK, sadly I did the wrong thing with the debug logging...just put up https://github.com/coreos/rpm-ostree/pull/4353 which should really help this time. What's going wrong is we're trying to output a new progress bar when an old one is still active, and we need to know exactly which are old and new.
Would it be possible to update the COPR with RPMs including that PR? Or just send the RPMs here for F38
Btw, for me the crash reproduces basically every time if I update with rpm-ostree update and at the same time do flatpak update -y.
I'm getting this with a brand-new F38 Silverblue install FWIW
Have been experiencing this intermittently with F38 also. Sometimes rerunning rpm-ostree upgrade
works, but other times the core dump happens consecutively.
For me this is happening at this point:
⠙ Running posttrans scripts... thread 'tokio-runtime-worker' panicked at 'assertion failed: !(self.ptype == ProgressType::Task)', rust/src/console_progress.rs:124:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Aborted (core dumped)
I built an rpm from main, will try to reproduce again.
⠉ Receiving objects; 83% (506/605) 3,1 MB/s 146,6 MB (rpm-ostree update:26250): rpm-ostreed-DEBUG: 22:18:41.679: txn progress DownloadProgress
⠈ Receiving objects; 84% (509/605) 3,1 MB/s 148,9 MB (rpm-ostree update:26250): rpm-ostreed-DEBUG: 22:18:41.909: txn progress Message
Enabled rpm-md repositories: fedora-cisco-openh264 fedora-modular updates-modular updates updates-debuginfo updates-source fedora fedora-debuginfo fedora-source updates-archive
⠈ Receiving objects; 84% (509/605) 3,1 MB/s 148,9 MB (rpm-ostree update:26250): rpm-ostreed-DEBUG: 22:18:42.219: txn progress PercentProgress
thread 'tokio-runtime-worker' panicked at 'expected non-task progress type; current message is Receiving objects; 84% (509/605) 3,1 MB/s 148,9 MB', rust/src/console_progress.rs:128:13
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
fish: Job 1, 'env G_MESSAGES_DEBUG=rpm-ostree…' terminated by signal SIGABRT (Abort)
And a second one for the bulk (flatpak update -y
took a while, so I managed to get two crashes):
DEBUG Creating progress "Running post scripts" with type Task
⠖ Running post scripts... (rpm-ostree update:27127): rpm-ostreed-DEBUG: 22:21:11.695: txn progress TaskEnd
DEBUG Ending progress Task; suffix=Some("done")
Running post scripts... done
(rpm-ostree update:27127): rpm-ostreed-DEBUG: 22:21:11.696: txn progress TaskBegin
DEBUG Creating progress "Running posttrans scripts" with type Task
⠦ Running posttrans scripts... (rpm-ostree update:27127): rpm-ostreed-DEBUG: 22:21:15.202: txn progress Message
Enabled rpm-md repositories: fedora-cisco-openh264 fedora-modular updates-modular updates updates-debuginfo updates-source fedora fedora-debuginfo fedora-source updates-archive
⠒ Running posttrans scripts... (rpm-ostree update:27127): rpm-ostreed-DEBUG: 22:21:15.573: txn progress PercentProgress
thread 'tokio-runtime-worker' panicked at 'expected non-task progress type; current message is Running posttrans scripts', rust/src/console_progress.rs:128:13
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
fish: Job 1, 'env G_MESSAGES_DEBUG=rpm-ostree…' terminated by signal SIGABRT (Abort)
Was having the same error
[someone@somewhere ~]$ rpm-ostree update
Enabled rpm-md repositories: fedora-cisco-openh264 fedora-modular updates-modular updates fedora rpmfusion-nonfree-nvidia-driver
Importing rpm-md... done
rpm-md repo 'fedora-cisco-openh264' (cached); generated: 2023-03-14T10:56:46Z solvables: 4
rpm-md repo 'fedora-modular' (cached); generated: 2023-04-14T09:25:02Z solvables: 1082
rpm-md repo 'updates-modular' (cached); generated: 2018-02-20T19:18:14Z solvables: 0
rpm-md repo 'updates' (cached); generated: 2023-04-20T04:37:27Z solvables: 3831
rpm-md repo 'fedora' (cached); generated: 2023-04-14T09:32:40Z solvables: 69222
rpm-md repo 'rpmfusion-nonfree-nvidia-driver' (cached); generated: 2023-04-08T18:30:02Z solvables: 31
⠈ Receiving metadata objects: 0/(estimating) -/s 0 bytes...
Enabled rpm-md repositories: fedora-cisco-openh264 fedora-modular updates-modular updates fedora rpmfusion-nonfree-nvidia-driver
⠁ Receiving metadata objects: 0/(estimating) -/s 0 bytes...
thread 'tokio-runtime-worker' panicked at 'assertion failed: !(self.ptype == ProgressType::Task)', rust/src/console_progress.rs:124:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Just did a rpm-ostree cleanup -b -p -r -m
and everything ran fine after that.
I don't think all the options are needed, but it was the command I had in history and I just ran it on a whim...
I guess the problem is related to some cached content...?
Piling on here, randomly on overlay of RPM fusion mesa-vdpau-drivers-freeworld: rpm-ostree install mesa-vdpau-drivers-freeworld
. Just updated to F38 (pretty smooth apart from RPM fusion stuff, thanks!). Deployment completed on repeating the command.
coredumpctl info:
stijn@tangaloor: ~
$ coredumpctl info
PID: 3879 (rpm-ostree)
UID: 1000 (stijn)
GID: 1000 (stijn)
Signal: 6 (ABRT)
Timestamp: Fri 2023-04-21 10:42:34 CEST (1min 38s ago)
Command Line: rpm-ostree install mesa-vdpau-drivers-freeworld
Executable: /usr/bin/rpm-ostree
Control Group: /user.slice/user-1000.slice/user@1000.service/app.slice/app-org.gnome.Terminal.slice/vte-spawn-3cf7a228-1d18-4007-aa7e-6363284335bb.scope
Unit: user@1000.service
User Unit: vte-spawn-3cf7a228-1d18-4007-aa7e-6363284335bb.scope
Slice: user-1000.slice
Owner UID: 1000 (stijn)
Boot ID: 302b227720394945a6a38ccd187a901b
Machine ID: 49cf6ab9e0d14993ba7805d3a9dd578b
Hostname: tangaloor
Storage: /var/lib/systemd/coredump/core.rpm-ostree.1000.302b227720394945a6a38ccd187a901b.3879.1682066554000000.zst (present)
Size on Disk: 764.2K
Message: Process 3879 (rpm-ostree) of user 1000 dumped core.
Module libcrypt.so.2 from rpm libxcrypt-4.4.33-7.fc38.x86_64
Module libbrotlicommon.so.1 from rpm brotli-1.0.9-11.fc38.x86_64
Module libsasl2.so.3 from rpm cyrus-sasl-2.1.28-9.fc38.x86_64
Module libevent-2.1.so.7 from rpm libevent-2.1.12-8.fc38.x86_64
Module libkeyutils.so.1 from rpm keyutils-1.6.1-6.fc38.x86_64
Module libkrb5support.so.0 from rpm krb5-1.20.1-8.fc38.x86_64
Module libcom_err.so.2 from rpm e2fsprogs-1.46.5-4.fc38.x86_64
Module libk5crypto.so.3 from rpm krb5-1.20.1-8.fc38.x86_64
Module libkrb5.so.3 from rpm krb5-1.20.1-8.fc38.x86_64
Module libbrotlidec.so.1 from rpm brotli-1.0.9-11.fc38.x86_64
Module libgssapi_krb5.so.2 from rpm krb5-1.20.1-8.fc38.x86_64
Module libpsl.so.5 from rpm libpsl-0.21.2-2.fc38.x86_64
Module libssh.so.4 from rpm libssh-0.10.4-4.fc38.x86_64
Module libidn2.so.0 from rpm libidn2-2.3.4-2.fc38.x86_64
Module libnghttp2.so.14 from rpm nghttp2-1.52.0-1.fc38.x86_64
Module libblkid.so.1 from rpm util-linux-2.38.1-4.fc38.x86_64
Module libattr.so.1 from rpm attr-2.5.1-6.fc38.x86_64
Module libyaml-0.so.2 from rpm libyaml-0.2.5-9.fc38.x86_64
Module libmagic.so.1 from rpm file-5.44-3.fc38.x86_64
Module libzck.so.1 from rpm zchunk-1.3.1-1.fc38.x86_64
Module libcurl.so.4 from rpm curl-7.87.0-7.fc38.x86_64
Module libaudit.so.1 from rpm audit-3.1-2.fc38.x86_64
Module liblua-5.4.so from rpm lua-5.4.4-9.fc38.x86_64
Module libpopt.so.0 from rpm popt-1.19-2.fc38.x86_64
Module librpm_sequoia.so.1 from rpm rust-rpm-sequoia-1.3.0-1.fc38.x86_64
Module libpcre2-8.so.0 from rpm pcre2-10.42-1.fc38.1.x86_64
Module libffi.so.8 from rpm libffi-3.4.4-2.fc38.x86_64
Module libselinux.so.1 from rpm libselinux-3.5-1.fc38.x86_64
Module libmount.so.1 from rpm util-linux-2.38.1-4.fc38.x86_64
Module libgmodule-2.0.so.0 from rpm glib2-2.76.1-1.fc38.x86_64
Module libxml2.so.2 from rpm libxml2-2.10.4-1.fc38.x86_64
Module libbz2.so.1 from rpm bzip2-1.0.8-13.fc38.x86_64
Module liblz4.so.1 from rpm lz4-1.9.4-2.fc38.x86_64
Module libzstd.so.1 from rpm zstd-1.5.5-1.fc38.x86_64
Module liblzma.so.5 from rpm xz-5.4.1-1.fc38.x86_64
Module libacl.so.1 from rpm acl-2.3.1-6.fc38.x86_64
Module libostree-1.so.1 from rpm ostree-2023.1-2.fc38.x86_64
Module libz.so.1 from rpm zlib-1.2.13-3.fc38.x86_64
Module libcrypto.so.3 from rpm openssl-3.0.8-2.fc38.x86_64
Module libssl.so.3 from rpm openssl-3.0.8-2.fc38.x86_64
Module libmodulemd.so.2 from rpm libmodulemd-2.14.0-5.fc38.x86_64
Module libsolv.so.1 from rpm libsolv-0.7.22-4.fc38.x86_64
Module libsolvext.so.1 from rpm libsolv-0.7.22-4.fc38.x86_64
Module librepo.so.0 from rpm librepo-1.15.1-2.fc38.x86_64
Module libjson-c.so.5 from rpm json-c-0.16-4.fc38.x86_64
Module libsystemd.so.0 from rpm systemd-253.2-1.fc38.x86_64
Module librpmio.so.9 from rpm rpm-4.18.1-1.fc38.x86_64
Module librpm.so.9 from rpm rpm-4.18.1-1.fc38.x86_64
Module libglib-2.0.so.0 from rpm glib2-2.76.1-1.fc38.x86_64
Module libgobject-2.0.so.0 from rpm glib2-2.76.1-1.fc38.x86_64
Module libgio-2.0.so.0 from rpm glib2-2.76.1-1.fc38.x86_64
Module libpolkit-gobject-1.so.0 from rpm polkit-122-3.fc38.1.x86_64
Module libarchive.so.13 from rpm libarchive-3.6.1-4.fc38.x86_64
Module libjson-glib-1.0.so.0 from rpm json-glib-1.6.6-4.fc38.x86_64
Module libcap.so.2 from rpm libcap-2.48-6.fc38.x86_64
Stack trace of thread 3880:
#0 0x00007fc118468b94 __pthread_kill_implementation (libc.so.6 + 0x8eb94)
#1 0x00007fc118417aee raise (libc.so.6 + 0x3daee)
#2 0x00007fc11840087f abort (libc.so.6 + 0x2687f)
#3 0x0000565195440917 _ZN11panic_abort18__rust_start_panic5abort17h3c900776fa44b50cE (rpm-ostree + 0x99e917)
#4 0x00007fc104029720 n/a (n/a + 0x0)
ELF object binary architecture: AMD x86-64
I also have this time to time no idea where it come, at first i've thinked it would come from some source blocked, but i never have checked more since i've seen some other had also the problem.
I say i have it to see any potential fix, and let know that one more user have the problem.
FWIW, this happens on every one of my systems for the past few months; I thought it would be fixed quickly (as these things usually are), and I've tried all the workarounds listed here (clearing cache and such), but it still happens. I have to try to upgrade 2 to 5 times before it finishes with "no changes" (even though there are changes according to rpm-ostree status
and reboots); it's likely because the last time, when it's successful, has no changes to the previous time, when it failed with the error.
I do have overrides on my machines, and I do have third party repos enabled (OBS for darktable on my personal desktop and a copr for a patched mutter on both personal and work) — but this was even happening on my work machine without any additional repos (before I added the patched triple-buffered mutter). However, I do have a local package on my work laptop (for VPN certificates) and layered Fedora packages on both.
I've asked my coworkers who also use Silveblue and they haven't seen the issue. They don't layer packages, however.
So my guess is that it's really related to layering packages.
I've asked my coworkers who also use Silveblue and they haven't seen the issue. They don't layer packages, however.
So my guess is that it's really related to layering packages.
The worst it's this bug wasn't there on 37
@garrett Can you try grabbing some of the debug info using env G_MESSAGES_DEBUG=rpm-ostree,rpm-ostreed RUST_LOG=rpm_ostree=debug,rpmostree_rust=debug rpm-ostree upgrade
or so from https://github.com/coreos/rpm-ostree/issues/4284#issuecomment-1483911450 ?
It'd be particularly helpful to use rpm-ostree-2023.3 as a base for updating.
I've tried briefly again today to reproduce this, and took another tour through the code trying to retrofit these symptoms to code paths; but no luck so far. If we don't succeed soon I'll try just spamming debug log prints around the relevant code paths I can think of and have someone who is hitting this update to that.
@cgwalters: Sure!
rpm-ostree:
Version: '2023.3'
Git: 8ab6f143a0ecad8b125b47dee8bbeb2e99f1b215
Features:
- rust
- compose
- container
- fedora-integration
On my work laptop, before I realized I should modify the service too:
On my personal workstation, where I did modify the service before:
rpm-ostree--debug--personal.txt
The second of these is probably more useful, as it should have more debugging info:
In
Writing rpmdb...done
(rpm-ostree upgrade:37328): rpm-ostreed-DEBUG: 11:37:59.031: txn progress TaskBegin
DEBUG Creating progress "Writing OSTree commit" with type Task
⠒ (rpm-ostree upgrade:37328): rpm-ostreed-DEBUG: 11:38:09.273: txn progress Message
Enabled rpm-md repositories: fedora-modular updates-modular updates fedora graphics_darktable graphics_darktable_master copr:copr.fedorainfracloud.org:calcastor:gnome-patched updates-archive
(rpm-ostree upgrade:37328): rpm-ostreed-DEBUG: 11:38:09.377: txn progress PercentProgress
thread 'tokio-runtime-worker' panicked at 'expected non-task progress type; current message is Writing OSTree commit', rust/src/console_progress.rs:128:13
It's really strange to me that we're seeing the "Enabled rpm-md repositories" message at that point...that seems like a leftover bit that should have been seen earlier.
I'm starting to build up a theory that this is somehow related to us iterating a new main context in the commit path, but we have leftover queued idle work on the mainloop or so? But not yet clear to me what that is...
@garrett Can you also attach the daemon-side logs from that run? i.e. journalctl -u rpm-ostreed
around that time.
@cgwalters: Apologies for the delay! Here are the logs from ostreed on May 09 from my personal machine:
I think the relevant entries should be around 11:37 - 11:38.
All of the repeated "loading fedora" messages looked very odd to me, and digging in this is coming from the "sysroot changed' signal. We don't have even debug logging for invocation of these messages (will add) but...that gave me a clue.
I think I'm starting to understand this bug. Basically what I think is happening right now is that there are two processes talking to the daemon (gnome-software and /usr/bin/rpm-ostree).
While the human has invoked e.g. /usr/bin/rpm-ostree upgrade
, at the same time gnome-software is calling one of the other os methods that are not transactions (and hence not serialized). I'm not sure which yet; maybe e.g. get_deployments_rpm_diff
?
But basically at some point, one of these other methods ends up calling something in the core that tries to refresh rpm-md, and that in turn will try to invoke the progress APIs, and that will cause this crash.
First I'm going to add a lot more useful debug logging to this path to verify.
I think the best fix here is to change our output infrastructure to only route through the dbus txn progress for the transaction thread.
Ahhh yes I think the core bug here is everything using os_create_dnf_context_simple
(which is the "not transactional inspection" stuff) calling rpmostree_context_download_metadata
which will indeed enable the progress infrastructure. That'd completely explain the race here.
Yep, this is really easy to reproduce now that I understand the bug; having layered packages makes it much much more likely (but I think it may be reproducible without).
Basically in one shell: while rpm-ostree upgrade >/dev/null; do :; done
And in another shell, use this handy helper to do rpmostree_busctl_call_os WhatProvides as 1 provided-testing-daemon
. That really quickly causes the first client to crash.
So yes, will look at reworking the transaction progress stuff to be thread-local.
OK, got a PR up in https://github.com/coreos/rpm-ostree/pull/4405
I also hit this several times today in the office, but when I came home 'update' worked first time.
Can one or more people try out https://copr.fedorainfracloud.org/coprs/g/CoreOS/continuous/build/5919148/ which has the fix merged?
Well, it's just a single sample, but I just did rpm-ostree update together with flatpak update -y, which usually triggered the issue, and it went fine.
This is now up as https://bodhi.fedoraproject.org/updates/FEDORA-2023-ce6c9dcd69
Host system details
Expected vs actual behavior
Expected:
No panic.
Steps to reproduce it
Not reproducible.
Would you like to work on the issue?
Should be assigned to someone else.