[securedrop-proxy] Implement proxy v2 architecture

legoktm commented 1 year ago

See current WIP in the rusting branch.

During the Nov. 2023 SecureDrop retreat, we (primarily @cfm and myself) attempted a rewrite-it-in-Rust version of the proxy, which revealed a number of architectural issues we'd like to address in a new system that we're calling "v2". We started by taking the current proxy code and distilling it into a not-super-formal specification, which revealed some architectural issues we want to fix.

As a summary, we determined that:

In general sd-proxy should be a dumb HTTP proxy, passing data and responses though transparently, though we will compromise on this for security reasons.
The main security property is that we limit outbound requests to the configured JI onion hostname.
A secondary (or possibly of equal value) security property is that HTTP requests are constructed by the proxy, so it is harder to create malicious HTTP requests
- E.g. the Rust reqwest library will error if header values contain newlines, so header injection isn't possible. (But we also don't restrict headers at all, yet.)
We should be streaming responses, especially for file downloads.
- This will enable file download progress and resumable file downloads.
Error handling is TBD
sd-client (via sdk) should use the proxy in dev mode instead of issuing requests on its own and reimplementing some logic
The only configuration the proxy needs is the hostname (http://foo.onion), which will be injected via an environment variable instead of parsing and assembling protocol, hostname, port, etc. It is likely this will happen via qubesdb, but out of scope for this repo and is instead tracked in https://github.com/freedomofpress/securedrop-workstation/issues/853.

legoktm commented 1 year ago

RIIR attempt: permalink

We have an 80-line Rust implementation of proxy v2 that is missing error handling and streaming of responses. Some key features:

Hostname comes from the environment, no parsing yaml needed
Using Url.join() ensures that our hostname is not clobbered by some path injection tricks (caveat: this uses the WHATWG URL spec)
serde performs type validation when deserializing, so we don't need to do it later

For error handling, the tricky part is when we're streaming we can't just output an error. My current idea is that, if qrexec supports it, we can emit the error over stderr and exit, so the client/sdk can retrieve the error independent of streamed output so we can later try to resume the download. One thing that we did decide is that HTTP errors (4xx/5xx) should be transmitted as a successful request e.g. status_code: 400 instead of an error.

legoktm commented 1 year ago

Some other considered ideas:

Just use an off-the-shelf HTTP proxy ("nginx"), which will limit itself to one domain. But we need something to translate qrexec calls to HTTP requests, which in theory could be a bash script that invokes curl, but we don't want a shell script in such a critical path and would write said thing in Rust/Python, which then could just make the requests directly itself instead of using nginx in the middle.
Using Qubes (or iptables, etc.) firewall rules don't work because this is an onion address

Things not yet considered:

Qubes has a ConnectTCP policy thing, which is marked as dangerous since it allows connecting on any port. Can we make this less dangerous? (And also we'd lose the security gained from controlled HTTP request construction.)
Arti will require a running daemon to maintain a connection to the Tor network. Qubes supports socket based services (https://www.qubes-os.org/doc/qrexec-socket-services/#how-it-works, except they don't support stderr). Arti has a TorAddr struct in which you can pass in the onion hostname to explicitly connect to, but that's in arti_client, which is one or two levels below the arti_hyper we probably want to use. My current gut feeling is that we should leave Arti for a v3.
What URL spec do C-Tor and Arti use? Today's Daniel Stenberg blog post reminded me there are conflicting options:
- The WHATWG standard keeps changing and the world that tries to stick to RFC 3986 still needs to sometimes get and adapt to WHATWG influences in order to interoperate with the browser-centric part of the web. It leaves URL parsing and “URL syntax” everywhere in a sorry state.

legoktm commented 1 year ago

I implemented streaming and error handling in the Rust version, and as I was porting the response content-type sniffing code, I realized that was weird because the client/sdk already knows whether we're expecting a JSON or binary response, so we could have it tell the proxy how to reply, by setting a stream: true/false attribute in the incoming request.

rocodes commented 1 year ago

re connectTCP, yes you can limit it to specific ports using RPC policy rules, by making the port a service argument, similar to the example they give in 90-default-policy (where it's marked as dangerous). So we would add a custom policy file in the proxy vm called /etc/qubes-rpc/qubes.ConnectTCP+ourportnumber allowing the connection, and we'd add a line in our dom0 policy file basically like so:

rpc/qubes.ConnectTCP +ourportnumber oursourcevm ourdestvm allow

and we'd leave the default deny rule in place.

rocodes commented 1 year ago

Sorry, I have one more question. If you're talking about combining sd-whonix and sd-proxy I can understand why qubes firewall rules don't work due to the onion address. But if you're not talking about combining them, and you still have an outbound tor-capable vm on the other side of the proxy vm, can't you still make use of qvm-firewall (still in conjunction with rpc policies) between proxy and torvm as long as you don't specify a hostname? It's just another way of restricting tcp traffic to a certain port at that point.

legoktm commented 1 year ago

re connectTCP, yes you can limit it to specific ports using RPC policy rules

Oh this is great. Thanks!

But if you're not talking about combining them, and you still have an outbound tor-capable vm on the other side of the proxy vm, can't you still make use of qvm-firewall (still in conjunction with rpc policies) between proxy and torvm as long as you don't specify a hostname?

Yep; I wasn't very clear, I was trying to say that qvm-firewall can't do the "limit outbound requests to this one onion hostname" part of what we want the proxy VM to do.

cfm commented 12 months ago

@legoktm in https://github.com/freedomofpress/securedrop-client/issues/1678:

Some other considered ideas:

Arti will require a running daemon to maintain a connection to the Tor network. Qubes supports socket based services (https://www.qubes-os.org/doc/qrexec-socket-services/#how-it-works, except they don't support stderr). Arti has a TorAddr struct in which you can pass in the onion hostname to explicitly connect to, but that's in arti_client, which is one or two levels below the arti_hyper we probably want to use. My current gut feeling is that we should leave Arti for a v3.

What URL spec do C-Tor and Arti use? Today's Daniel Stenberg blog post reminded me there are conflicting options:

The WHATWG standard keeps changing and the world that tries to stick to RFC 3986 still needs to sometimes get and adapt to WHATWG influences in order to interoperate with the browser-centric part of the web. It leaves URL parsing and “URL syntax” everywhere in a sorry state.

I got a bee in my bonnet about this question and spent some time with the URL standard and its implications. 0dbd91dfa01b0c2f3a26cf7e89768c5a4f28d3cb is a by-product we may choose to discard for the reasons I've summarized in the commit message. Here, more important, are a few conclusions:

We care about the preservation of the entire configured origin tuple of (scheme, host, port), not just the hostname. That is, maintaining a correct origin on every outbound request, no matter what's been join()'d onto it, is the fundamental security property of the proxy.
We can think of this as a kind of end-to-end property, at least Client → ... → Tor.
If ... is securedrop-proxy → Whonix, then this is the only URL-parsing we have to worry about:
1. reqwest uses url, so it will parse our URL exactly as we construct it. (The URL standard has this idempotence as an explicit goal.)
2. Whonix acts as a transparency SOCKS proxy, so it's only getting the ATYP, DST.ADDR, DST.PORT specified by the client—in this case, reqwest.
3. So, if those values are correct leaving reqwest, they will be correct when they reach torsocks, which copies them into the Tor connection.
If ... is securedrop-proxy with Arti, then the layering changes subtly:
1. arti-hyper would have us use hyper, which uses http::Uri—not uri::Uri.
2. arti-client lets us set the TorAddr explicitly, without higher-level abstraction (as you note).
3. Tor doesn't do any further parsing of the URL per se; it just looks at the hostname for .onion (among others).

So "use one URL parser" seems to boil down here to "Can we find a way to use reqwest (for url) with either arti-client or arti-hyper?"

cfm commented 12 months ago

@legoktm in https://github.com/freedomofpress/securedrop-client/issues/1678:

I implemented streaming and error handling in the Rust version, and as I was porting the response content-type sniffing code, I realized that was weird because the client/sdk already knows whether we're expecting a JSON or binary response, so we could have it tell the proxy how to reply, by setting a stream: true/false attribute in the incoming request.

Does this mean you'd like us to commit to JSON-out HTTP-back, rather than round-trip HTTP with a reconstructed HTTP request, as we'd speculated about in https://pad.riseup.net/p/securedrop-proxy-in-rust#L87? That is cleaner than the current JSON-out maybe-HTTP-back (e.g.). But I'm still not fully convinced that the JSON-to-HTTP conversion gets us stronger security guarantees than HTTP reconstruction, which would:

in the short term, let us shrink the SDK by eliminating the initial JSON construction; and
in the long term, move us closer to generating most of the SDK from an OpenAPI specification of the Journalist API.

legoktm commented 12 months ago

So "use one URL parser" seems to boil down here to "Can we find a way to use reqwest (for url) with either arti-client or arti-hyper?"

reqwest + arti is an open issue: https://gitlab.torproject.org/tpo/core/arti/-/issues/287 - I peeked at it during my learning time with Arti and it needed some upstream support in reqwest to plug in a different hyper backend, see https://github.com/seanmonstar/reqwest/issues/1321

Does this mean you'd like us to commit to JSON-out HTTP-back, rather than round-trip HTTP with a reconstructed HTTP request ... That is cleaner than the current JSON-out maybe-HTTP-back (e.g.). But I'm still not fully convinced that the JSON-to-HTTP conversion gets us stronger security guarantees than HTTP reconstruction ...

No, just trying to fit it in the current model. I agree that HTTP-to-HTTP (i.e. a real HTTP proxy) would be better, but maybe we scope that for v3 when we adopt Arti, which'll possibly be easier to integrate with?

Something like:

v2:
- enables streaming of responses for file progress + resumable dowloads
- cleans up error handling
- configured via qubesdb / environment
- still a qrexec that shells out
v3:
- connects to Tor via Arti
- speaks HTTP (or maybe SOCKS?), no custom HTTP-in-JSON protocols

cfm commented 12 months ago

So we're on the same page after all, @legoktm. :-) I've written up the path through these three architectures in freedomofpress/securedrop-engineering#82. (Note to self: This should supersede what I began to document in https://github.com/freedomofpress/securedrop-proxy/compare/dca4722%5E...61305c.)

legoktm commented 11 months ago

I started working on tests today and ran into two issues:

There's a rvcr crate to add vcr recording+replay to reqwest but it explicitly does not work with the blocking client (c.f. https://github.com/TrueLayer/reqwest-middleware/issues/32). I ported everything to use async, I don't think it adds too much complexity.
You can't capture/mock stdin/stdout/stderr easily like you can in Python. We can inject each as needed, but it started to get a bit messy. I'm thinking of writing a struct to wrap all 3 I/O streams and having a buffer based alternative for tests.

legoktm commented 11 months ago

Here's my async commit, it's fine: https://github.com/freedomofpress/securedrop-proxy/commit/799f498796ab1a7cfdcaa2ce9b30a58ace9a2577
And then here's the I/O wrapper and I kind of hate it: https://github.com/freedomofpress/securedrop-proxy/commit/50e1b877b616246f47149454a99ef550d23e4d96

We can use fixtures to mock some of the responses, but you can't really encode timeouts or streaming lag in these kinds of tests, which is problematic. Also because we communicate over environment variables, concurrent tests confuses everything, so you have to pass --test-threads=1 for them to pass.

So...I want to delete those two commits and just write tests that shell out to the securedrop-proxy binary and run it against an actual HTTP server, like https://github.com/mccutchen/go-httpbin, so we can really verify stuff like streaming and lag, etc. And the bonus would be that we can also use them as tests for a packaged securedrop-proxy binary.

legoktm commented 11 months ago

I'm quite happy with the integration tests I implemented today: https://github.com/freedomofpress/securedrop-proxy/commit/10c80ac1489310203bd004f62cb6ab450c9e1560 - they shell out to a binary and run against an HTTP server that is basically the same as https://httpbin.org/ but running locally. So there's no mocking or VCRs, it's all integration tests \o/ - and it can be run against a packaged/pre-built securedrop-proxy, so we could use these same test cases for in-VM testing too.

I also pushed a CI manifest, but we're still missing cargo-audit and cargo-vet. And then I need to poke at the packaging, since this is no longer a Python project.

legoktm commented 10 months ago

We missed something when implementing streaming mode, specifically that we want to look at the response headers to verify download integrity via the etag header; I think this is reasonable. This is the only header I see being used, so I think we may want to focus on the use-case of verifying download integrity instead of arbitrary headers. But we still need a way to expose that info, off the top of my head I can only think of adding it to stderr.

freedomofpress / securedrop-client

[securedrop-proxy] Implement proxy v2 architecture #1678