QubesOS / qubes-issues

The Qubes OS Project issue tracker
https://www.qubes-os.org/doc/issue-tracking/
541 stars 48 forks source link

4.1: Whonix WS failing #6597

Closed jpds closed 2 years ago

jpds commented 3 years ago

Qubes OS version

4.1, all updates applied.

Affected component(s) or functionality

Whonix Workstation.

Brief summary

I applied updates to my 4.1 system a couple of weeks ago, and ever since then, I've not been able to use my Whonix Workstation DVMs, running systemcheck in the DVM gives:

$ systemcheck 
[INFO] [systemcheck] disp891 | Whonix-Workstation | whonix-ws-15-dvm DispVM AppVM | Tue 11 May 2021 11:30:05 AM UTC
[INFO] [systemcheck] Tor Connection Result:
Tor's Control Port could not be reached. Attempt 1 of 5. Could be temporary due to a Tor restart. Trying again...

[ERROR] [systemcheck] Tor Connection Result:
Tor's Control Port could not be reached!
...

The gateway itself is fine:

$ systemcheck 
[INFO] [systemcheck] sys-whonix | Whonix-Gateway | whonix-gw-15 TemplateBased ProxyVM | Tue 11 May 2021 11:31:20 AM UTC
[INFO] [systemcheck] Connected to Tor.
...

I've also tried reinstalling both templates, same result.

How Reproducible

Always

andrewdavidwong commented 3 years ago

CC: @adrelanos

adrelanos commented 3 years ago

Qubes-Whonix for Qubes 4.1 is unsupported.

https://www.whonix.org/wiki/Unsupported

Not yet supported.

https://www.whonix.org/wiki/About#Support_Schedule

andrewdavidwong commented 3 years ago

Qubes-Whonix for Qubes 4.1 is unsupported.

https://www.whonix.org/wiki/Unsupported

Not yet supported.

https://www.whonix.org/wiki/About#Support_Schedule

I think the idea is to fix any bugs so that Qubes-Whonix will work correctly when 4.1 is released. Your link does not say when Qubes-Whonix support for 4.1 will begin. It just quotes the Qubes docs, which say that users who wish to use Qubes-Whonix have to upgrade to 4.1 within a month of 4.1's release to stay current, but what if a user upgrades on day one? Is that too soon? If so, how long should they wait?

adrelanos commented 3 years ago

A lot presumptions, hopefully all reasonable:

I am presuming that none of Whonix's changes introduced any Qubes 4.1 specific breakage, and this being a temporary issue introduced by Qubes R4.1 changes.

Presuming things stay as now, there's no way Marek this issue goes unnoticed by Marek due to Qubes openQA. And before Qubes 4.1. is released for testers or even stable, Marek would do a basic basic functionality test of all pre-installed functionality. This issue would be obvious to catch and is covered by openQA. Marek started Qubes-Whonix template rebuilds in past during Qubes future version development work. Therefore I don't think Qubes 4.1 would be released (as stable) with broken Qubes-Whonix. That dilemma might be theoretically possible, but practically I hope not.

Should any changes be required on the Qubes-Whonix thing, I guess Marek would open a ticket.

Keeping up with Qubes R4.1 presumably still rapid development, developers only state / porting Qubes-Whonix to R4.1 (if that is required) isn't practical for me at this stage due to various reasons.

marmarek commented 3 years ago

For the record: openQA have found (possibly the same) issue: https://github.com/Whonix/updates-status/issues/83 The test was green a week earlier. I have not checked yet if it's caused by some Qubes change, Whonix change, or maybe something else (like Debian change).

blatchard commented 3 years ago

Hi there, I have had the exact same problem and have been trying to resolve it for the last 4 days or so. It also occured for me after a dom0 update, i had also updated to kernel-latest and kernel-latest-qubes-vm (5.12.4-1).

I tried booting off the other kernels, and i tried setting as default all versions I had of qubes-vm kernel, which made no difference at all, I tried restoring all whonix related templates and appVM's from a backup where they all previously worked and spent countless hours looking up what the problem might be.

I finally found what fixed the issue for me with the information on this page: https://github.com/QubesOS/qubes-issues/issues/5110 . I issued in dom0 as su the commands: qvm-features sys-whonix ipv6 '' and qvm-features anon-whonix ipv6 ''

I do not know if the second command was needed as I issued both at the same time.

I shutdown anon-whonix, restarted sys-whonix and ran anon-whonix tor browser and the issue was gone (kernel-qubes-vm 5.10.37-1). I started a whonix-ws-15-dvm and was able to connect without issue. I decided to check the kernel-latest and rebooted qubes after setting kernel-latest-qubes-vm 5.12.4-1 to default and booted at grub with kernel 5.12.4-1 and the issue was still gone. I also tried running a fedora-33-dvm through sys-whonix and this also had tor connectivity.

I am guessing that one of the updates enabled ipv6 in the whonix templates somehow, I am very new to github so I don't claim to have technically diagnosed this in anyway, but hope this information will be of use to @jpds and maybe to @adrelanos and @marmarek with fixing it.

jpds commented 3 years ago

@blatchard is indeed correct, I did have my sys-whonix ipv6 set to 1 - setting this back made Tor in my DVM work again.

adrelanos commented 3 years ago

On Qubes R4 this works for me. Don't use that for Qubes R4.1 until this issue is resolved.

qvm-features sys-whonix ipv6 1
qvm-features anon-whonix ipv6 1
qvm-features whonix-ws-15-dvm ipv6 1
qvm-features whonix-ws-15 ipv6 1
qvm-features whonix-gw-15 ipv6 1
marmarek commented 3 years ago

Another failure on openQA: https://openqa.qubes-os.org/tests/18129

[INFO] [systemcheck] Tor Connection Result:
- Connecting for 148 seconds.
- Tor Circuit: established.
- Timesync status: not done.
- sdwdate reports: Initial time fetching in progress...
[ERROR] [systemcheck] Time Synchronization Result:
systemcheck gave up waiting.

Time synchronization status: pending
sdwdate reports: Initial time fetching in progress...
whonix_firewall status: first run after boot

And logs fragment from within anon-whonix:

[2021-06-02 04:55:35] [   36.268754] sudo[2522]: systemcheck : TTY=unknown ; PWD=/home/user ; USER=root ; COMMAND=/usr/bin/fuser /var/lib/dpkg/lock /var/cache/apt/archives/lock
[2021-06-02 04:55:35] [   36.270170] sudo[2522]: pam_unix(sudo:session): session opened for user root by (uid=0)
[2021-06-02 04:55:35] [   36.382454] sudo[2522]: pam_unix(sudo:session): session closed for user root
[2021-06-02 04:55:36] [   37.763717] systemd[1]: Started redirect /run/tor/socks to Whonix-Gateway port 9050.
[2021-06-02 04:55:41] 

[   42.026048] systemd[1]: systemd-fsckd.service: Succeeded.
[2021-06-02 04:55:48] [   49.893101] systemd[1]: canary.service: Start operation timed out. Terminating.
[2021-06-02 04:55:48] [   49.896980] systemd[1]: canary.service: Main process exited, code=killed, status=15/TERM
[2021-06-02 04:55:48] [   49.897960] systemd[1]: canary.service: Failed with result 'timeout'.
[2021-06-02 04:55:48] [   49.899142] systemd[1]: Failed to start canary.
[2021-06-02 04:55:48] [   49.904221] systemd[1]: Reached target Multi-User System.
[2021-06-02 04:55:48] [   49.909306] systemd[1]: Starting Update UTMP about System Runlevel Changes...
[2021-06-02 04:55:49] [   49.931349] systemd[1]: systemd-update-utmp-runlevel.service: Succeeded.
[2021-06-02 04:55:49] [   49.932763] systemd[1]: Started Update UTMP about System Runlevel Changes.
[2021-06-02 04:55:49] [   49.934219] systemd[1]: Startup finished in 9.317s (kernel) + 40.617s (userspace) = 49.934s.
[2021-06-02 04:55:49] [   50.142033] systemd[1]: canary.service: Service RestartSec=100ms expired, scheduling restart.
[2021-06-02 04:55:49] [   50.143273] systemd[1]: canary.service: Scheduled restart job, restart counter is at 1.
[2021-06-02 04:55:49] [   50.145377] systemd[1]: Stopped canary.
[2021-06-02 04:55:49] [   50.149910] systemd[1]: Starting canary...
[2021-06-02 04:55:49] [   50.224023] systemd[1]: Started canary.
[2021-06-02 04:55:49] [   50.226886] canary-daemon[3194]: /usr/lib/systemcheck/canary-daemon: INFO: start.
[2021-06-02 04:57:24] [  145.419882] sdwdate[687]: remote_times.py: i: 0 | timeout_network
[2021-06-02 04:57:24] [  145.430166] sdwdate[687]: remote 0: http://qn4qfeeslglmwxgb.onion
[2021-06-02 04:57:24] [  145.430998] sdwdate[687]: * comment: Lucy Parsons Labs https://lucyparsonslabs.com/securedrop  https://web.archive.org/web/20170322113502/https://lucyparsonslabs.com/securedrop/
[2021-06-02 04:57:24] [  145.431741] sdwdate[687]: * took_time     : 120.02 second(s)
[2021-06-02 04:57:24] [  145.432725] sdwdate[687]: * half_took_time: 60.01 second(s)
[2021-06-02 04:57:24] [  145.433315] sdwdate[687]: * exit_code: -9
[2021-06-02 04:57:24] [  145.433881] sdwdate[687]: * stdout: empty
[2021-06-02 04:57:24] [  145.434401] sdwdate[687]: * stderr: empty
[2021-06-02 04:57:24] [  145.434997] sdwdate[687]: * remote_status: timeout
[2021-06-02 04:57:24] [  145.435740] sdwdate[687]: remote 1: http://3kyl4i7bfdgwelmf.onion
[2021-06-02 04:57:24] [  145.436108] sdwdate[687]: * comment: http://www.wefightcensorship.org  https://archive.fo/GhgMU[2021-06-02 04:57:24] 
[2021-06-02 04:57:24] [  145.436612] sdwdate[687]: * took_time     : 7.82 second(s)
[2021-06-02 04:57:24] [  145.436880] sdwdate[687]: * half_took_time: 3.91 second(s)
[2021-06-02 04:57:24] [  145.620392] sdwdate[687]: * replay_protection_unixtime: 1610841500
[2021-06-02 04:57:24] [  145.621789] sdwdate[687]: * remote_unixtime           : 1622624131
[2021-06-02 04:57:24] [  145.622341] sdwdate[687]: * consensus/valid-after           : 2021-06-02 08:00:00
[2021-06-02 04:57:24] [  145.622955] sdwdate[687]: * replay_protection_time          : 2021-01-17 00:00:00
[2021-06-02 04:57:24] [  145.623332] sdwdate[687]: * remote_time                     : 2021-06-02 08:55:31
[2021-06-02 04:57:24] [  145.623807] sdwdate[687]: * consensus/valid-until           : 2021-06-02 11:00:00
[2021-06-02 04:57:24] [  145.624762] sdwdate[687]: * time_diff_raw        : 94 second(s)
[2021-06-02 04:57:24] [  145.625040] sdwdate[687]: * time_diff_lag_cleaned: 90.09 second(s)
[2021-06-02 04:57:24] [  145.625345] sdwdate[687]: * Time Replay Protection         : sane
[2021-06-02 04:57:24] [  145.625735] sdwdate[687]: * Tor Consensus Time Sanity Check: sane
[2021-06-02 04:57:24] [  145.626028] sdwdate[687]: * remote_status: True
[2021-06-02 04:57:24] [  145.626926] sdwdate[687]: remote 2: http://tinhat233xymse34.onion
[2021-06-02 04:57:24] [  145.628661] sdwdate[687]: * comment: https://thetinhat.com https://web.archive.org/web/20170421233308/https://thetinhat.com
[2021-06-02 04:57:24] [  145.629140] sdwdate[687]: * took_time     : 9.41 second(s)
[2021-06-02 04:57:24] [  145.630626] sdwdate[687]: * half_took_time: 4.71 second(s)
[2021-06-02 04:57:24] [  145.759985] sdwdate[687]: * replay_protection_unixtime: 1610841500
[2021-06-02 04:57:24] [  145.760828] sdwdate[687]: * remote_unixtime           : 1622624133
[2021-06-02 04:57:24] [  145.761211] sdwdate[687]: * consensus/valid-after           : 2021-06-02 08:00:00
[2021-06-02 04:57:24] [  145.761711] sdwdate[687]: * replay_protection_time          : 2021-01-17 00:00:00
[2021-06-02 04:57:24] [  145.762275] sdwdate[687]: * remote_time                     : 2021-06-02 08:55:33
[2021-06-02 04:57:24] [  145.762927] sdwdate[687]: * consensus/valid-until           : 2021-06-02 11:00:00
[2021-06-02 04:57:24] [  145.763781] sdwdate[687]: * time_diff_raw        : 95 second(s)
[2021-06-02 04:57:24] [  145.764484] sdwdate[687]: * time_diff_lag_cleaned: 90.29 second(s)
[2021-06-02 04:57:24] [  145.765029] sdwdate[687]: * Time Replay Protection         : sane
[2021-06-02 04:57:24] [  145.765733] sdwdate[687]: * Tor Consensus Time Sanity Check: sane
[2021-06-02 04:57:24] [  145.766080] sdwdate[687]: * remote_status: True
[2021-06-02 04:57:24] [  145.766637] sdwdate[687]: remote_times.py: urls_list:
[2021-06-02 04:57:24] [  145.768777] sdwdate[687]: ['http://qn4qfeeslglmwxgb.onion', 'http://3kyl4i7bfdgwelmf.onion', 'http://tinhat233xymse34.onion']
[2021-06-02 04:57:24] [  145.769065] sdwdate[687]: remote_times.py: status_list:
[2021-06-02 04:57:24] [  145.769313] sdwdate[687]: ['timeout', 'ok', 'ok']
[2021-06-02 04:57:24] [  145.769673] sdwdate[687]: remote_times.py: took_time_list:
[2021-06-02 04:57:24] [  145.769978] sdwdate[687]: [120.02, 7.82, 9.41]
[2021-06-02 04:57:24] [  145.770599] sdwdate[687]: remote_times.py: half_took_time_list:
[2021-06-02 04:57:24] [  145.770896] sdwdate[687]: [60.01, 3.91, 4.71]
[2021-06-02 04:57:24] [  145.772305] sdwdate[687]: remote_times.py: remote_unixtime_list:
[2021-06-02 04:57:24] [  145.773777] sdwdate[687]: [0, 1622624131, 1622624133]
[2021-06-02 04:57:24] [  145.774637] sdwdate[687]: remote_times.py: time_diff_raw_int_list:
[2021-06-02 04:57:24] [  145.774942] sdwdate[687]: [0, 94, 95]
[2021-06-02 04:57:24] [  145.775243] sdwdate[687]: remote_times.py: time_diff_lag_cleaned_float_list:
[2021-06-02 04:57:24] [  145.775885] sdwdate[687]: [0.0, 90.09, 90.29]
[2021-06-02 04:57:24] [  145.777619] sdwdate[687]: 2021-06-02 08:55:49 - sdwdate - INFO - returned urls "['http://qn4qfeeslglmwxgb.onion', 'http://3kyl4i7bfdgwelmf.onion', 'http://tinhat233xymse34.onion']"
[2021-06-02 04:57:24] [  145.777960] sdwdate[687]: 2021-06-02 08:55:49 - sdwdate - INFO -
[2021-06-02 04:57:24] [  145.778868] sdwdate[687]: 2021-06-02 08:55:49 - sdwdate - INFO - failed_urls: 1 allowed_failures: 5
[2021-06-02 04:57:24] [  145.779166] sdwdate[687]: 2021-06-02 08:55:49 - sdwdate - INFO - pool 1: http://3kyl4i7bfdgwelmf.onion, web_time: 2021-06-02 08:55:31, took_time: 7.82 seconds, time_diff_raw: 94 seconds, time_diff_lag_cleaned: 90 seconds
[2021-06-02 04:57:24] [  145.780323] sdwdate[687]: 2021-06-02 08:55:49 - sdwdate - INFO - pool 2: http://tinhat233xymse34.onion, web_time: 2021-06-02 08:55:33, took_time: 9.41 seconds, time_diff_raw: 95 seconds, time_diff_lag_cleaned: 90 seconds
[2021-06-02 04:57:24] [  145.781625] sdwdate[687]: 2021-06-02 08:55:49 - sdwdate - INFO - Running sdwdate fetch loop. iteration: 2
[2021-06-02 04:57:24] [  145.781936] sdwdate[687]: 2021-06-02 08:55:49 - sdwdate - INFO - pool 0: pool_size: 15 url_index: 8 already_picked_number: 2 already_picked_index: [4, 8]
[2021-06-02 04:57:24] [  145.782665] sdwdate[687]: 2021-06-02 08:55:49 - sdwdate - INFO - requested urls ['http://nrkvarslekidu2uz.onion']
[2021-06-02 04:57:24] [  145.910102] sdwdate[687]: remote_times.py: url_to_unixtime_command (s):
[2021-06-02 04:57:24] [  145.910482] sdwdate[687]: url_to_unixtime 10.137.0.13 9108 http://nrkvarslekidu2uz.onion 80 true

whonixcheck executed in sys-whonix later passes. Is it just about slow Tor connection?

adrelanos commented 3 years ago

Previous post: likely unrelated. In that case Tor's ControlPort was reachable. Otherwise sdwdate would wait. For example, origin of consensus/valid-after is from Tor control protocol.

Likely slow/broken sdwdate onion v2's. sdwdate was ported to onion v3 sources recently. Currently in Whonix testers repository. This probably deserves a Whonix template rebuild since onion v2's are to be deprecated soon by Tor upstream.

andrewdavidwong commented 3 years ago

There appear to be quite a few people on the Qubes Forum who are unable to get Whonix to work in Qubes 4.1:

https://forum.qubes-os.org/t/cant-get-sys-whonix-to-work-in-4-1/2444

adrelanos commented 3 years ago

Created ticket Qubes-Whonix 16 for Qubes R4.1.

adrelanos commented 2 years ago

I cannot reproduce this at this time. Seems to have been fixed meanwhile. I suggest to close this ticket and to re-open should anyone still experience this issue.