Open zenmonkeykstop opened 3 years ago
Still pending reproduction. At minimum during this sprint we may see another RC (and then the final at the end of the sprint period) and can do some additional testing here.
The changes to the connection wizard have been deferred to Tails 4.20, due on 2021-07-13. Re-milestoning this issue to the next release.
The Tor Connection Assistant is part of Tails 4.20 (I've re-titled the issue accordingly). I've not been able to reproduce the original issue, but given how timing-dependent it is, it's quite possible that it may still arise.
What I am seeing is a consistent brief flash of a connection error, the first time I connect to Tor (using defaults, without bridges):
This is likely triggered by our Tor restart here: https://github.com/freedomofpress/securedrop/blob/develop/install_files/ansible-base/roles/tails-config/files/securedrop_init.py#L91-L95
The assistant is smart enough to pick up that the connection was temporarily lost. It's also smart enough to recognize the recovery, and the error goes away. Still, this has significant potential for confusion.
Here's what I've learned after another 90 minutes or so of investigation:
DisableNetwork
to 1
until the network is configured, and its TCA network hook waits until that's flipped to 0
before it completes. For this reason, the behavior on initial run is different than after subsequent bounces.pgrep
to determine if it is running, and we could do the same in our hook prior to the hook firing. As a result, the user would have to proactively close the window before SecureDrop-specific changes are applied, which is not ideal, but would sidestep concurrency issues.For now, I'll open a docs PR with a screenshot of the TCA, and a temporary note about the error behavior above that users may observe.
Thanks for following up! That does seem like a related behaviour to my original report - it could just come down to timing/network speed. The fix of waiting for the TCA to finish before firing the network hook makes sense if we can do it, though I'd worry about users never closing the TCA window and then getting confused as to why their shortcuts aren't working. Will set up a 4.20 stick and attempt to reproduce both.
I'm seeing the error flash now as described above, no sign of the earlier more serious error. A fix would still be good but it may not be as urgent.
I'm still seeing the flash error on Tails 4.21, but only on initial connect, not on reconnect. I'm also sometimes seeing a more serious error which is likely related: the Tails built-in update check complains that it cannot connect (it does not retry). In short, restarting Tor exactly at the same time that Tails is doing its best to connect to Tor and run its own update checks is probably not a sustainable way to go.
In this commit I am switching it to reload
instead of restart
(along with addressing #6042) which works fine -- in testing so far, it seems to fix both errors. However, this may not be a good long term solution either, as reloading an auth configuration is not supported if sandbox mode is enabled:
https://manpages.debian.org/testing/tor/torrc.5.en.html When the Sandbox is 1, the following options can not be changed when tor is running: Address, ConnLimit, CookieAuthFile, DirPortFrontPage, ExtORPortCookieAuthFile, Logs, ServerDNSResolvConfFile, ClientOnionAuthDir (and any files in it won’t reload on HUP signal).
Meanwhile, it looks like Tails is preparing to re-enable said sandbox mode: https://gitlab.tails.boum.org/tails/tails/-/issues/18237
If I understand correctly, the reload
fix will basically stop working the moment that's done.
One other strategy that may be worth exploring:
securedrop_init.py
script into two, one for updating the Tor config, one for checking for updates./securedrop-admin tailsconfig
; run the update checker via network hookIn prelim testing, it seems to work fine to make the torrc
additions before a connection is established; the additions appear to be preserved. It's only the update check portion of securedrop_init.py
that requires a connection to run successfully.
Run the Tor config script at startup
As far as I can tell, this would require running a script as root on startup, for which Tails does not appear to provide any facilities (.config/autostart
can be persisted for running .desktop
files as amnesia
) - please let me know if I'm missing a way to do this.
Otherwise I think it would be best to ask upstream for implementation advice here, to make sure we handle our setup logic in a way that does not interfere with Tails' setup and update checks.
I've tested the behavior in Tails 4.22 and still observe the following issues:
The first is the more serious issue and IMO a good reason to try to resolve this at least in the long run.
As expected, Tails 4.22 re-enabled Sandbox
mode for Tor, which is incompatible with dynamically reloading our desired configuration changes without restarting the Tor service. So that easy option seems unavailable.
Per brief discussion with Kev today, I've investigated using the pre-up.d
phase of the NetworkManager dispatcher to run the hook. This requires changes to our persistence configuration to create a symlink in a new location.
Because the network is not fully up at this stage, it would also require splitting the hook into two: 1) the Tor restart for the configuration change, 2) the update check, which can run as before while other stuff is going on.
Unfortunately, with Tails 4.22, the TCA enters a persistent error state if our hook runs before it tries to do its thing (I believe I was able to run it without issues prior to running TCA in previous Tails versions). That may be resolvable with a closer read of the TCA code, so I'm still cautiously optimistic that the pre-up.d
approach can be made to work.
I've also investigated if Tails' shell functions may be of service in monitoring the state of the system before we do things, but so far I don't see how -- AFAICT, to safely restart Tor, we have to do so either before TCA runs or after the TCA window is closed by the user (because it actively polls for the Tor status while it is open) and the Tails update check has run.
I think if the pre-up script works, we don't need to do a tor restart - the pre-up would pop the auth-private files into place, Tails' own network hook would bounce Tor, and we'd do the upgrade check once it was complete. This would probably also mean a couple of changes in the tailsconfig Ansible run as well tho.
After spending some time with this issue I'm inclined to defer it:
Splitting the securedrop_init.py
script and executing the config part before the TCA comes up and the GUI updater part afterwards also works fine on initial connects - but once tca.service
is started, any subsequent network bumps fail leaving the system torrc
locked with DisableNetwork 1
set. In order to get the split to work as expected we'd probably need to disable the TCA service while we updated the config, which seems dodgy, and in light of the difficulty in reproducing the Tails updater error, probably not worth the effort.
A good fix for this overall would involve:
The scope of said good fix is probably too much to make it into the 2.1.0 milestone.
Description
With SD set up on Tails, a network hook script (or rather the
securedrop_init.py
script run by the hook) runs every time a network interface (that's not the loopback) comes up. This script copies Tor config changes into place and bounces Tor, to allow access to authenticated hidden services (JI, SSH).In Tails 4.19rc1, Tor is now started manually by the user via the new Tor Connection app (TCA). Depending on timing, the network hook script may restart Tor while TCA is also starting it. Subsequent network calls in the script fails, meaning that the hook fails overall. TCA interprets this as Tor failing - this can result in a failed connection or the TCA reconnecting using bridges. This doesn't seem to happen for everybody - if TCA connects quickly enough then it isn't an issue.
Steps to Reproduce
Expected Behavior
Tor connects without issue.
Actual Behavior
TCA assumes vanilla Tor connections don't work, falls back to default bridges. Network hook script not run completely.
Comments
Adding a time.sleep(30) after the tor restart in the hook seems to alleviate the issue. It might make sense have the script watch for Tor coming back up before proceeding.