Tor 0.4.8.9 broken in combination with vanguards in Qubes Debian templates

adrelanos commented 11 months ago

Qubes OS release

Qubes R4.2

Summary

Downloads over Tor (with vanguards enabled) get interrupted after a few seconds.

This bug was introduced between Tor version 0.4.7.16-1 (from Debian bookworm security repository) and Tor version 0.4.8.9-1~d12.bookworm+1 (from deb.torproject.org). I am certain that I could pinpoint it to it.

The issue is only reproducible if vanguards is installed.

The older Tor version from Debian bookworm security repository version 0.4.7.16-1 does not have this issue.

Steps to reproduce:

Get Qubes.
Use a Qubes Debian bookworm Template.
Enable deb.torproject.org
sudo apt update
sudo apt install --no-install-recommends vanguards tor
Edit /etc/tor/vanguards.conf and change control_socket = to control_socket = /run/tor/control (related ticket)
sudo systemctl enable vanguards (potential Debian bug not being enabled by default)
sudo systemctl restart tor@default
sudo systemctl restart vanguards
(In App Qube)
torsocks curl --fail --output /tmp/test.tar.xz https://dist.torproject.org/torbrowser/13.0.5/tor-browser-linux-x86_64-13.0.5.tar.xz

What is the current bug behavior?

Connection drops after a bit of continued file downloads.

torsocks curl --fail --output /tmp/test.tar.xz https://dist.torproject.org/torbrowser/13.0.5/tor-browser-linux-x86_64-13.0.5.tar.xz

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  3  107M    3 3624k    0     0  24100      0  1:17:51  0:02:34  1:15:17 29815
curl: (18) transfer closed with 108874640 bytes remaining to read
zsh: exit 18    torsocks curl --fail --output /tmp/test.tar.xz

What is the expected behavior?

No connection drops.

Environment

Qubes R4.2
Debian based App Qube
Tor version 0.4.8.9-1~d12.bookworm+1
deb.torproject.org bookworm repository
vanguards version 0.3.1-2.3 from packages.debian.org

Also reproducible in Qubes-Whonix and Non-Qubes-Whonix (Whonix for VirtualBox). I wasn't able to reproduce this yet on a real (non-Qubes) Debian bookworm yet.

Additional information

sudo systemctl stop vanguards && sudo systemctl restart tor@default fixes this issue. This shows that this issue is only happening if Tor is combined with vanguards.

Tor bug

Is this a Tor bug? Possibly, yes. Reported upstream:

Why am I reporting this against Qubes? Because in the past there was a similar Qubes bug:

https://github.com/QubesOS/qubes-issues/issues/2840 (a Qubes kernel upgrade randomly broke SSL connections)

Since this issue is reproducible in VMs (Qubes Debian App Qube) and in VirtualBox (Whonix) but not reproducible on real (non-Qubes) Debian, this implies that this might be a Qubes specific issue.

For issue tracking

cause by Whonix: no
- except maybe if arguing vanguards should be installed but then still most likely no Whonix specific source code causing this
affects Qubes-Whonix yes:

DemiMarie commented 11 months ago

For Arch updates, I worked around the problem with a shell script that ran curl -C - in an infinite loop, so that downloads succeeded eventually. This is not possible with other package managers, as they do not allow providing an arbitrary script to use for downloads.

DemiMarie commented 11 months ago

I wasn't able to reproduce this yet on a real (non-Qubes) Debian bookworm yet.

Does that mean that Qubes Debian bookworm is any less real :laughing:?

More seriously, have you tried a Debian bookworm HVM without any Qubes packages installed? This should behave like non-Qubes Debian bookworm.

adrelanos commented 11 months ago

For Arch updates, I worked around the problem with a shell script that ran curl -C - in an infinite loop, so that downloads succeeded eventually.

Meaning you've been able to reproduce this bug?

I wasn't able to reproduce this yet on a real (non-Qubes) Debian bookworm yet.

Does that mean that Qubes Debian bookworm is any less real 😆?

Ha, indeed. Created

https://github.com/QubesOS/qubes-issues/issues/8730

for it.

More seriously, have you tried a Debian bookworm HVM without any Qubes packages installed?

Tried now. Not reproducible in HVM.

Reproducible in PVH (Qubes default) but not HVM.

A user in Whonix forums reported this being reproducible also in a Debian 12 (bookworm) KVM VM (without Qubes involved).

In Summary.

reproducible here:

real Debian 12 KVM
Qubes Debian 12 based PVH App Qube
Whonix (Debian 12 based) in VirtualBox (Non-Qubes-Whonix)

not reproducible here:

real Debian 12 on hardware

Affected virtualizers are Qubes PVH, (non-Qubes) KVM, (non-Qubes) VirtualBox. Not affected is real hardware (outside of any VMs).

What is the common factor (shared code base) in the affected virtualizers?

One might blame it on Tor / vanguards but that seems wrong. Their software is functional on real hardware. If it's broken in VMs it seems there is something wrong with the VMs. It was useful to report against Tor anyhow because the Tor developers might have insights on how this issue is triggered and might be able to provide workarounds.

DemiMarie commented 11 months ago

For Arch updates, I worked around the problem with a shell script that ran curl -C - in an infinite loop, so that downloads succeeded eventually.

Meaning you've been able to reproduce this bug?

Yes, but only intermittently. Sometimes it works.

More seriously, have you tried a Debian bookworm HVM without any Qubes packages installed?

Tried now. Not reproducible in HVM.

Reproducible in PVH (Qubes default) but not HVM.

A user in Whonix forums reported this being reproducible also in a Debian 12 (bookworm) KVM VM (without Qubes involved).

In Summary.

reproducible here:

real Debian 12 KVM

Qubes Debian 12 based PVH App Qube

Whonix (Debian 12 based) in VirtualBox (Non-Qubes-Whonix)

not reproducible here:

real Debian 12 on hardware

Affected virtualizers are Qubes PVH, (non-Qubes) KVM, (non-Qubes) VirtualBox. Not affected is real hardware (outside of any VMs).

What is the common factor (shared code base) in the affected virtualizers?

Xen PVH and KVM share essentially no code, but they do share some behaviors:

CPUID is emulated.
Performance counters are often not available.
Probably some others.
A virtual network is presented to the guest.

One might blame it on Tor / vanguards but that seems wrong. Their software is functional on real hardware. If it's broken in VMs it seems there is something wrong with the VMs. It was useful to report against Tor anyhow because the Tor developers might have insights on how this issue is triggered and might be able to provide workarounds.

I highly doubt that this is a virtualizer problem.

Can you try in a (Podman/LXC/systemd-nspawn/etc) container? I suspect Tor is making assumptions about networking that simply do not hold in virtualized environments. For instance, expecting networking to be handled via DHCP could trigger a problem like this.

marmarek commented 11 months ago

I think I got similar issue in https://openqa.qubes-os.org/tests/86485 (after suspend, if that matters). I don't see any errors in Tor log, but for vanguards I see:

Nov 27 12:30:47 host vanguards[3853]: NOTICE[Mon Nov 27 12:30:47 2023]: Vanguards 0.3.1 connected to Tor 0.4.8.9 using stem 1.8.1
Nov 27 12:30:47 host vanguards[3853]: NOTICE[Mon Nov 27 12:30:47 2023]: Tor needs descriptors: Cannot read /var/lib/tor/cached-microdesc-consensus: [Errno 2] No such file or directory: '/var/lib/tor/cached-microdesc-consensus'. Trying again...
Nov 27 12:30:47 host vanguards[3853]: WARNING[Mon Nov 27 12:30:47 2023]: Tor daemon connection failed: Cannot read /var/lib/tor/cached-microdesc-consensus: [Errno 2] No such file or directory: '/var/lib/tor/cached-microdesc-consensus'. Trying again...
Nov 27 12:30:48 host vanguards[3853]: NOTICE[Mon Nov 27 12:30:48 2023]: Vanguards 0.3.1 connected to Tor 0.4.8.9 using stem 1.8.1
Nov 27 12:30:48 host vanguards[3853]: NOTICE[Mon Nov 27 12:30:48 2023]: Tor needs descriptors: Cannot read /var/lib/tor/cached-microdesc-consensus: [Errno 2] No such file or directory: '/var/lib/tor/cached-microdesc-consensus'. Trying again...
Nov 27 12:30:49 host vanguards[3853]: NOTICE[Mon Nov 27 12:30:49 2023]: Vanguards 0.3.1 connected to Tor 0.4.8.9 using stem 1.8.1
Nov 27 12:30:49 host vanguards[3853]: NOTICE[Mon Nov 27 12:30:49 2023]: Tor needs descriptors: Cannot read /var/lib/tor/cached-microdesc-consensus: [Errno 2] No such file or directory: '/var/lib/tor/cached-microdesc-consensus'. Trying again...
Nov 27 12:30:50 host vanguards[3853]: NOTICE[Mon Nov 27 12:30:50 2023]: Vanguards 0.3.1 connected to Tor 0.4.8.9 using stem 1.8.1

Then, after I restarted just tor@default.service, I got this:

Nov 27 12:45:57 host vanguards[3853]: WARNING[Mon Nov 27 12:45:57 2023]: Tor daemon connection closed. Trying again...
Nov 27 12:45:58 host vanguards[3853]: NOTICE[Mon Nov 27 12:45:58 2023]: Vanguards 0.3.1 connected to Tor 0.4.8.9 using stem 1.8.1

And then systemcheck is happy. I did not stopped nor restarted vanguards. Maybe it's about service start order?

apparatius commented 11 months ago

Restarting vanguards service is fixing the issue temporary but it's reappearing again after some time. Maybe it works only for Tor circuits existing when vanguards starts but not for newly created ones after some time is passed after vanguards was started.

adrelanos commented 11 months ago

Update: A user in the forums reported having reproduced this on hardware (outside of any VMs) too.

Additional reports about reproducibility on hardware would be appreciated.

Nov 27 12:30:49 host vanguards[3853]: NOTICE[Mon Nov 27 12:30:49 2023]: Tor needs descriptors: Cannot read /var/lib/tor/cached-microdesc-consensus: [Errno 2] No such file or directory: '/var/lib/tor/cached-microdesc-consensus'. Trying again...

This has always been like this.

QubesOS / qubes-issues