QubesOS / qubes-issues

The Qubes OS Project issue tracker
https://www.qubes-os.org/doc/issue-tracking/
541 stars 48 forks source link

dom0 upgrades through sys-whonix broken - `Cannot retrieve repository metadata (repomd.xml) status code 404` #8389

Closed adrelanos closed 1 year ago

adrelanos commented 1 year ago

Qubes OS release

Qubes R4.1 Qubes R4.2

Brief summary

dom0 upgrades through sys-whonix are broken.

Steps to reproduce

Use sys-whonix as UpdateVM for dom0.

sudo qubes-dom0-update

Expected behavior

Functional upgrades.

Actual behavior

Error message.

Cannot retrieve repository metadata (repomd.xml) status code 404

Additional information

Was reported in the forums: https://forums.whonix.org/t/sys-whonix-onionized-dom0-update-fail-on-qubes-r4-1-with-qubes-whonix-17/16988

Might or might not be related to general Tor unreliablity issues.

I didn't experience the bug myself.

Related

adrelanos commented 1 year ago

Can you upgrade Whonix Templates?

Can you upgrade other Templates over sys-whonix?

adrelanos commented 1 year ago

For debugging purposes, please run:

systemcheck --ip-test

In:

And post the output here in case there are any warnings or errors. Otherwise please confirm if no warnings or errors are shown.

marmarek commented 1 year ago

@adrelanos example failure of systemcheck just a moment ago (R4.2, Whonix 17): https://openqa.qubes-os.org/tests/79142 Here, systemcheck failed in anon-whonix, so might be a different issue than updates one, but could still be related. In the log I see:

[INFO] [systemcheck] Tor Connection Result:
- Connecting for 74 seconds.
- Tor Circuit: established.
- Timesync status: not done.
- sdwdate reports: Initial time fetching in progress...
- onion-time-pre-script reports: 
__ ### START: ### /usr/libexec/helper-scripts/onion-time-pre-script
__ Status: Subsequent run after boot.
__ Static Time Sanity Check: Within minimum time 'Mon Jun 12 00:00:00 UTC 2023' and expiration timestamp 'Tue May 17 10:00:00 UTC 2033', ok.
__ Tor circuit: established
__ Tor Consensus Time Sanity Check: Clock within consensus parameters consensus/valid-after 2023-08-04 12:00:00 and consensus/valid-until 2023-08-04 15:00:00.
__ Conclusion: Tor already reports circuit established.
__ ### END: ### Exiting with exit_code '0' indicating 'success'.
[ERROR] [systemcheck] Time Synchronization Result:
systemcheck gave up waiting.

Time synchronization status: fail
sdwdate reports: General Timeout Error. Internet connection might be down.
whonix_firewall status: first run after boot
onion-time-pre-script reports: 
__ ### START: ### /usr/libexec/helper-scripts/onion-time-pre-script
__ Status: Subsequent run after boot.
__ Static Time Sanity Check: Within minimum time 'Mon Jun 12 00:00:00 UTC 2023' and expiration timestamp 'Tue May 17 10:00:00 UTC 2033', ok.
__ Tor circuit: established
__ Tor Consensus Time Sanity Check: Clock within consensus parameters consensus/valid-after 2023-08-04 12:00:00 and consensus/valid-until 2023-08-04 15:00:00.
__ Conclusion: Tor already reports circuit established.
__ ### END: ### Exiting with exit_code '0' indicating 'success'.

Possible issues:

- sdwdate time sources might be dysfunctional.

Recommendations:

A) Rerun systemcheck:
dom0 -> Start Menu -> ServiceVM: anon-whonix -> systemcheck
or in Terminal: systemcheck
or more verbose in Terminal: systemcheck --verbose --gui --cli

B) Restart sdwdate.
dom0 -> Start Menu -> ServiceVM: anon-whonix -> sdwdate-gui -> right click on sdwdate-gui systray -> Restart sdwdate
or in Terminal: sudo service sdwdate restart

C) Manually set the time.

As last resort...

1. Open a terminal. (dom0 -> Start Menu -> ServiceVM: anon-whonix -> Terminal)
2. Use the manual clock setting tool. sudo --set-home clock-random-manual-gui
3. Simulate sdwdate success. sudo -u sdwdate touch /run/sdwdate/first_success
4. Rerun systemcheck.

This is failure mode I see in whonix tests semi-regularly. The very same test run in parallel on a different system succeeded, so it isn't some persistent failure. I see in there that Tor reported establishing connection correctly, so it's unclear to me why sdwdate failed in the end... systemcheck called in sys-whonix a moment later reported success.

adrelanos commented 1 year ago

Quote @marmarek

This is failure mode I see in whonix tests semi-regularly. The very same test run in parallel on a different system succeeded, so it isn't some persistent failure. I see in there that Tor reported establishing connection correctly, so it's unclear to me why sdwdate failed in the end... systemcheck called in sys-whonix a moment later reported success.

From that log...

Aug 04 12:39:38 host root[2465]: /etc/xen/scripts/vif-route-qubes: ifdown vif5.0 failed Aug 04 12:39:38 host root[2473]: /etc/xen/scripts/vif-route-qubes: ip route del 10.137.0.11 dev vif5.0 metric 32747 failed Aug 04 12:39:38 host root[2478]: /etc/xen/scripts/vif-route-qubes: ip addr del 10.137.0.9/32 dev vif5.0 failed

Known Qubes issue?

Aug 04 12:31:12 host tinyproxy-wrapper[866]: WARNING: obsolete config item on line 14
Aug 04 12:31:12 host tinyproxy-wrapper[866]: WARNING: obsolete config item on line 15
Aug 04 12:31:12 host tinyproxy-wrapper[866]: WARNING: obsolete config item on line 16
Aug 04 12:31:12 host tinyproxy-wrapper[866]: WARNING: obsolete config item on line 17

Known Qubes issue?

[INFO] [systemcheck] Time Synchronization Result: Ok. [INFO] [systemcheck] Connected to Tor.

So at the time when systemcheck finished running, both Tor and sdwdate was OK.

Aug 04 12:35:16 host vanguards[851]: NOTICE[Fri Aug 04 12:35:16 2023]: Tor has been failing all circuits for 30 seconds!

Seems like Tor network issues. Might have happened before or later.

I see in there that Tor reported establishing connection correctly, so it's unclear to me why sdwdate failed in the end...

It's possible that Tor reports that it's fully bootstrapped but then when attempting to reach onions, it will still fail. sdwdate log would show that. sdwdate log can be conveniently viewed using sudo sdwdate-log-viewer. sdwdate-log-viewer is just a simple wrapper around journalctl which aggregates all time related log entries. Easier to read than the full journal log.

marmarek commented 1 year ago

The failed systemcheck log (the one above, from sys-whonix was successful one): https://openqa.qubes-os.org/tests/79142/file/whonixcheck-whonixcheck-anon-whonix.log

Aug 04 12:39:38 host root[2465]: /etc/xen/scripts/vif-route-qubes: ifdown vif5.0 failed Aug 04 12:39:38 host root[2473]: /etc/xen/scripts/vif-route-qubes: ip route del 10.137.0.11 dev vif5.0 metric 32747 failed Aug 04 12:39:38 host root[2478]: /etc/xen/scripts/vif-route-qubes: ip addr del 10.137.0.9/32 dev vif5.0 failed

Known Qubes issue?

This is sys-whonix, after starting and stopping anon-whonix, so I think this is on anon-whonix shutdown. Looks harmless, but might worth fixing anyway.

Aug 04 12:31:12 host tinyproxy-wrapper[866]: WARNING: obsolete config item on line 14
Aug 04 12:31:12 host tinyproxy-wrapper[866]: WARNING: obsolete config item on line 15
Aug 04 12:31:12 host tinyproxy-wrapper[866]: WARNING: obsolete config item on line 16
Aug 04 12:31:12 host tinyproxy-wrapper[866]: WARNING: obsolete config item on line 17

I haven't seen this one before. I guess it's related to newer tinyproxy in bookworm (not really related to Whonix, rather to Debian bookworm template).

It's possible that Tor reports that it's fully bootstrapped but then when attempting to reach onions, it will still fail

I guess that's likely what happened here.

sdwdate log can be conveniently viewed using sudo sdwdate-log-viewer

I can add saving that to the test. But full journal of all qubes are saved anyway, so probaly there is no need. It's in var_log.tar.gz, in var/log/xen/console dir (during tests, the journal is redirected to qube's console, so all ends up in console logs). Part of it:

[   31.285640] sdwdate[911]: __ Tor circuit: established
...
[  271.484538] sdwdate[911]: remote_times.py: i: 0 | timeout_network
[  271.629537] sdwdate[911]: remote_times.py: i: 1 | timeout_network
[  271.672957] sdwdate[911]: remote_times.py: i: 2 | timeout_network
[  271.673829] sdwdate[911]: remote 0: http://wasabiukrxmkdgve5kynjztuovbg43uxcbcxn6y2okcrsg7gb6jdmbad.onion
[  271.674146] sdwdate[911]: * comment: https://web.archive.org/web/20210604175753/https://wasabiwallet.io/
[  271.674322] sdwdate[911]: * took_time     : 120.01 second(s)
[  271.674699] sdwdate[911]: * half_took_time: 60.01 second(s)
[  271.674875] sdwdate[911]: * exit_code: -9
[  271.675202] sdwdate[911]: * stdout: empty
[  271.675768] sdwdate[911]: * stderr: empty
[  271.675939] sdwdate[911]: * remote_status: timeout
[  271.676373] sdwdate[911]: remote 1: http://searxspbitokayvkhzhsnljde7rqmn7rvoga6e4waeub3h7ug3nghoad.onion
[  271.676622] sdwdate[911]: * comment: https://web.archive.org/web/20210525165705/https://searx.space/ https://searx.space
[  271.677308] sdwdate[911]: * took_time     : 120.14 second(s)
[  271.677473] sdwdate[911]: * half_took_time: 60.07 second(s)
[  271.677632] sdwdate[911]: * exit_code: -9
[  271.677786] sdwdate[911]: * stdout: empty
[  271.677995] sdwdate[911]: * stderr: empty
[  271.678292] sdwdate[911]: * remote_status: timeout
[  271.678443] sdwdate[911]: remote 2: http://5kcyaqagvnrvyan7y5ntzreqsn2msowqlmtoo46qju2pctlbkzzztxqd.onion
[  271.680387] sdwdate[911]: * comment: https://web.archive.org/web/20210604163509/https://securedrop.org/directory/institute-quantitative-social-science-harvard-university/
[  271.680563] sdwdate[911]: * took_time     : 120.05 second(s)
[  271.680724] sdwdate[911]: * half_took_time: 60.02 second(s)
[  271.680887] sdwdate[911]: * exit_code: -9
[  271.681158] sdwdate[911]: * stdout: empty
[  271.681309] sdwdate[911]: * stderr: empty
[  271.681458] sdwdate[911]: * remote_status: timeout
[  271.681606] sdwdate[911]: remote_times.py: urls_list:
[  271.681773] sdwdate[911]: ['http://wasabiukrxmkdgve5kynjztuovbg43uxcbcxn6y2okcrsg7gb6jdmbad.onion', 'http://searxspbitokayvkhzhsnljde7rqmn7rvoga6e4waeub3h7ug3nghoad.onion', 'http://5kcyaqagvnrvyan7y5ntzreqsn2msowqlmtoo46qju2pctlbkzzztxqd.onion']
[  271.681973] sdwdate[911]: remote_times.py: status_list:
[  271.682237] sdwdate[911]: ['timeout', 'timeout', 'timeout']
[  271.682388] sdwdate[911]: remote_times.py: took_time_list:
[  271.682538] sdwdate[911]: [120.01, 120.14, 120.05]
[  271.682706] sdwdate[911]: remote_times.py: half_took_time_list:
[  271.682858] sdwdate[911]: [60.01, 60.07, 60.02] 
[  271.683135] sdwdate[911]: remote_times.py: remote_unixtime_list:
[  271.683921] sdwdate[911]: [0, 0, 0]
[  271.684212] sdwdate[911]: remote_times.py: time_diff_raw_int_list:
[  271.684362] sdwdate[911]: [0, 0, 0]
[  271.684680] sdwdate[911]: remote_times.py: time_diff_lag_cleaned_float_list:
[  271.686506] sdwdate[911]: [0.0, 0.0, 0.0]
[  271.686740] sdwdate[911]: 2023-08-04 12:38:46 - sdwdate - INFO - returned urls "['http://wasabiukrxmkdgve5kynjztuovbg43uxcbcxn6y2okcrsg7gb6jdmbad.onion', 'http://searxspbitokayvkhzhsnljde7rqmn7rvoga6e4waeub3h7ug3nghoad.onion', 'http://5kcyaqagvnrvyan7y5ntzreqsn2msowqlmtoo46qju2pctlbkzzztxqd.onion']"
[  271.688684] sdwdate[911]: 2023-08-04 12:38:46 - sdwdate - INFO -
[  271.688901] sdwdate[911]: 2023-08-04 12:38:46 - sdwdate - ERROR - General Timeout Error. Internet connection might be down.
[  271.689289] sdwdate[911]: 2023-08-04 12:38:46 - sdwdate - INFO - Sleeping for 103 minutes, ok.
[  271.689483] sdwdate[911]: 2023-08-04 12:38:46 - sdwdate - INFO - running command: sleep 6205.405491673

It failed this way for 2 iterations, so 6 urls in total. So, I guess this is the case you describe. Or maybe sdwdate considers circuit established prematurely? Is there some better Tor connectivity signal, besides trying to reach onion sites?

adrelanos commented 1 year ago

I can add saving that to the test. But full journal of all qubes are saved anyway, so probaly there is no need. It's in var_log.tar.gz, in var/log/xen/console dir (during tests, the journal is redirected to qube's console, so all ends up in console logs). Part of it:

I guess not needed for now.

It failed this way for 2 iterations, so 6 urls in total. So, I guess this is the case you describe.

Yes.

Or maybe sdwdate considers circuit established prematurely?

It relies on Tor control protocol status/circuit-established and status/bootstrap-phase. If status/circuit-established is 1, that alone should be sufficient. But it's not. In my experience, onion connections can still fail. I guess that might be dependent on the path through the Tor network on the side of the onion service, which might be overloaded.

Is there some better Tor connectivity signal, besides trying to reach onion sites?

That would very desirable indeed. During development, I've extensively looked for it. But such as thing doesn't seem to exist.

DemiMarie commented 1 year ago

I can add saving that to the test. But full journal of all qubes are saved anyway, so probaly there is no need. It's in var_log.tar.gz, in var/log/xen/console dir (during tests, the journal is redirected to qube's console, so all ends up in console logs). Part of it:

I guess not needed for now.

It failed this way for 2 iterations, so 6 urls in total. So, I guess this is the case you describe.

Yes.

Or maybe sdwdate considers circuit established prematurely?

It relies on Tor control protocol status/circuit-established and status/bootstrap-phase. If status/circuit-established is 1, that alone should be sufficient. But it's not. In my experience, onion connections can still fail. I guess that might be dependent on the path through the Tor network on the side of the onion service, which might be overloaded.

Is there some better Tor connectivity signal, besides trying to reach onion sites?

That would very desirable indeed. During development, I've extensively looked for it. But such as thing doesn't seem to exist.

Another problem is that onion sites are under deliberate DoS attack IIUC.

adrelanos commented 1 year ago

@bluesteal

Ok there is an error in whonix-gw and whonix-ws:

systemcheck --ip-test

[systemcheck] UpdatesProxy IP Leak Test: Qubes’ UpdatesProxy was not reachable. (curl exit code: [22] - [HTTP page not retrieved. The requested url was not found or returned another error with the HTTP error code being 404 or above.

Are you using dom0 upgrades through dom0?

adrelanos commented 1 year ago

Is this issue still reproducible?

adrelanos commented 1 year ago

For debugging purposes, in dom0 /etc/yum.repos.d/fedora.repo I messed up on purpose to learn what error message means what.

[fedora]
name=Fedora 37 - x86_64
#baseurl=http://download.fedoraproject.org/pub/fedora/linux/releases/37/Everything/x86_64/os/
metalink=https://mirrors.fedoraproject.org/metalink_mess_up_something_here?repo=fedora-37&arch=x86_64

Then I've run sudo qubes-dom0-update in dom0.

Error message:

Errors during downloading metadata for repository 'fedora':

As expected. So that obviously would not be a Whonix issue but an issue with a wrong URL. In that case, Whonix is properly fetching the file. It's just that the server tells us 404.

It's quite an artificial example but useful nonetheless.

Quote original bug report:

Cannot retrieve repository metadata (repomd.xml) status code 404

Also that seems that Whonix was properly fetching. In Whonix source code obviously does not do any URL rewrites.

Maybe the issue is as "simple" as the server really replied 404? In that case, I fail to see what Whonix could do about this.

It could be that a mirror was not properly updated?

Or could it be that some mirrors are blocking Tor? Maybe not intentionally but if these use cloud security services such as cloud based web application firewalls, other blacklists, these might block Tor relays. Also cloudflare is known to block Tor. Cloudflare blocks command line tools (such as command line downloaders) more than it blocks Tor Browser, which can add further confusion.

Could be similar to https://github.com/QubesOS/qubes-issues/issues/8388#issuecomment-1666538053.

adrelanos commented 1 year ago

Closing this until/if some user reports to have this issue.

andrewdavidwong commented 1 year ago

This issue has been closed as "cannot reproduce" (we were unable to reproduce this issue). If anyone believes this is a mistake, or if anyone can reproduce the issue, please leave a comment, and we'll be happy to reopen this. Thank you.

adrelanos commented 1 year ago

Could be similar to #8388 (comment).