Open mscherer opened 6 years ago
Hmm. This isn't (likely) related to the HTTP2 issues. It sounds like the commit was written, but an object was missing.
ostree fsck
should have a bit more information.
You should be able to work around this by doing:
# touch /ostree/repo/state/b5845ebd002b2ec829c937d68645400aa163e7265936b3e91734c6f33a510473.commitpartial
or so. See also: https://github.com/ostreedev/ostree/pull/345#issuecomment-262263824
(It'd all work a lot more nicely if we had fsck automatically drop commitpartial for missing objects, will do that at some point)
So the problem seems to have fixed by itself, as I was trying to diagnose things, and it just started to work while I was using strace on it. At least it seems now to download, I will report if that's truly fixed or not.
Now, I get:
[root@osnode-3 ~]# rpm-ostree upgrade
Receiving delta parts: 0/4 1,2 Mo/s 11,2 Mo/216,8 Mo ^[[Receiving delta parts: 3/4 126,8 ko/s 96,7 Mo/216,8 Mo
error: opcode set-read-source: Couldn't find file object '2c9d171e5c0ff46027953f045b8fb9f5cec383ef95c95bc9400fd9d8b9f2c81e'
What does ostree fsck
say? Are there a lot of missing objects?
Yes, and after reboot, rpm-ostree do not even work anymore. I suspect my attempt at cleanup didn't help.
So, after some manual fiddling, I now have this:
janv. 04 18:58:23 osnode-3.example.org systemd-coredump[1300]: Process 1296 (rpm-ostree) of user 0 dumped core.
Stack trace of thread 1296:
#0 0x00007f396d3942a0 g_str_hash (libglib-2.0.so.0)
#1 0x00007f396d392f5b g_hash_table_insert_internal (libglib-2.0.so.0)
#2 0x00007f396d3df871 g_variant_dict_init (libglib-2.0.so.0)
#3 0x00007f396d3df8d1 g_variant_dict_new (libglib-2.0.so.0)
#4 0x00005584cbe151ac rpmostree_deployment_get_layered_info (rpm-ostree)
#5 0x00005584cbe34e9f rpmostreed_deployment_generate_variant (rpm-ostree)
#6 0x00005584cbe332c6 sysroot_populate_deployments_unlocked (rpm-ostree)
#7 0x00005584cbe3437d rpmostreed_sysroot_populate (rpm-ostree)
#8 0x00005584cbe31613 rpmostreed_daemon_initable_init (rpm-ostree)
#9 0x00007f396d922367 g_initable_new_valist (libgio-2.0.so.0)
#10 0x00007f396d922419 g_initable_new (libgio-2.0.so.0)
#11 0x00005584cbe052b3 start_daemon (rpm-ostree)
#12 0x00005584cbe05747 rpmostree_builtin_start_daemon (rpm-ostree)
#13 0x00005584cbdf2b59 main (rpm-ostree)
#14 0x00007f396c95b03a __libc_start_main (libc.so.6)
#15 0x00005584cbdf2d8a _start (rpm-ostree)
Stack trace of thread 1297:
#0 0x00007f396ca478bb __poll (libc.so.6)
#1 0x00007f396d3a4ed9 g_main_context_iterate.isra.25 (libglib-2.0.so.0)
#2 0x00007f396d3a4fec g_main_context_iteration (libglib-2.0.so.0)
#3 0x00007f396d3a5031 glib_worker_main (libglib-2.0.so.0)
#4 0x00007f396d3cc4c6 g_thread_proxy (libglib-2.0.so.0)
#5 0x00007f396cd26609 start_thread (libpthread.so.0)
#6 0x00007f396ca53e6f __clone (libc.so.6)
Stack trace of thread 1298:
#0 0x00007f396ca478bb __poll (libc.so.6)
#1 0x00007f396d3a4ed9 g_main_context_iterate.isra.25 (libglib-2.0.so.0)
#2 0x00007f396d3a5272 g_main_loop_run (libglib-2.0.so.0)
#3 0x00007f396d98db36 gdbus_shared_thread_func (libgio-2.0.so.0)
#4 0x00007f396d3cc4c6 g_thread_proxy (libglib-2.0.so.0)
#5 0x00007f396cd26609 start_thread (libpthread.so.0)
#6 0x00007f396ca53e6f __clone (libc.so.6)
Manual fiddling, I reserectued object, and then marked them as "partial" with touch
# ls /ostree/repo/state/
a2b80278eea897eb1fec7d008b18ef74941ff5a54f86b447a2f4da0451c4291a.commitpartial b5845ebd002b2ec829c937d68645400aa163e7265936b3e91734c6f33a510473.commitpartial
And fsck is good:
# ostree fsck
Validating refs...
Enumerating objects...
Verifying content integrity of 1 commit objects...
1/29227 objects
2923/29227 objects
5845/29227 objects
8767/29227 objects
11689/29227 objects
14611/29227 objects
17533/29227 objects
20455/29227 objects
23377/29227 objects
26299/29227 objects
29221/29227 objects
2 partial commits not verified
fsck won't error on missing objects if the commits are marked as partial.
Weird, not sure what's going on with that stack trace...the code definitely handles missing commits (though doing so does cause the daemon to error out here):
Jan 05 02:54:21 localhost.localdomain rpm-ostree[4318]: error: Couldn't start daemon: Error setting up sysroot: Reading deployment 1: No such metadata object b5845e...
Hmm. At this point if you have the ability to capture the state of this system and post it it'd be useful. You probably don't need the whole disk image, just e.g. tar czf ostree-repo.tar.gz /ostree/repo/
or so. Plus an ls -al /ostree/deploy/fedora-atomic/deploy/
and the contents of the .origin
files there.
I think I did butcher more the repo by trying to fix, and I have now a different errors. And Meltdown/spectre did happen, so I kinda forgot what I did :/
But there is no a2b80278eea897eb1fec7d008b18ef74941ff5a54f86b447a2f4da0451c4291a.0.origin in /ostree/deploy/fedora-atomic/deploy/ so I guess that's the issue.
Ok so no, the lack of .origin is not the cause. I did fix by adding one, that's changing nothing to my new error message. I think I should maybe close this issue, cause the more I try to diagnose, the more I break, and that's kinda not useful.
Once that's working, I can try again to run ostree upgrade and kill it until I get a corrupted enough repo for debugging pleasure.
Host system details
Expected vs actual behavior
Expected:
Steps to reproduce it
So I was trying to upgrade my system, since there is a new release.
Then I did had this error:
I retried again upgrade, and then the error message appeared.
Then I figured "there is a cache issue", and tried the various cleanup options (-b, -r, -m, etc), and none changed. I didn't reboot yet.
In the log, I can see
So while HTTP2 error are well know, I suspect this did result into a crappy state on disk.
Would you like to work on the issue?
I do not know where to start, but will diagnose as much as I can.