curl / trurl

trurl is a command line tool for URL parsing and manipulation.
https://curl.se/trurl/
Other
3.1k stars 99 forks source link

Adds Linux Landlock based sandboxing #278

Closed daniel-j-h closed 1 week ago

daniel-j-h commented 4 months ago

Hey folks, the xz dilemma made me think about how we can strengthen curl and its ecosystem and what role sandboxing plays.

Reading @bagder's post https://daniel.haxx.se/blog/2024/04/10/verified-curl/ I was missing thoughts on sandboxing curl. That's why I wanted to start a conversation about sandboxing curl; but because curl is quite the monster for a first-time contributor I gave it a shot with the small and approachable trurl utility here first.

Motivation. The motivation is as follows: trurl is a binary parsing and handling urls on the command line.

But because it's linked against libcurl we get quite a few dependencies pulled in

$ ldd trurl|wc -l
47

so even though all we care about is e.g. parsing a sub-domain we get code e.g. for making http requests and so on.

Can we sandbox trurl and lock it down so that it doesn't get access to the user's filesystem or isn't allowed to make network requests for potential data exfiltration or both?

Sandboxing. There are various platform dependent ways to sandbox a program; here I gave it a go using the Linux kernel's landlock (similar to OpenBSD's unveil) simply because it's easy to use and integrate with and get started with. There are other ways that might be reasonable, too, e.g. seccomp.

Example. The changeset below adds landlock to trurl for sandboxing to prevent any filesystem access. The trurl program offers an option to read urls from a file and we could allow reads from that file; but for now I simply prevent all filesystem access to that we can use this option as a test case.

Building. To build the example you'll need a Kernel from the last 2-3 years and the landlock kernel headers headers. Then compile it with the definition -DHAVE_LINUX_LANDLOCK=1.


The main purpose of this is to start a conversation about sandboxing in trurl and hopefully learn enough to start the conversation for curl down the road, too. What are your thoughts here? Thank you!

dfandrich commented 4 months ago

While I'm all for sandboxing, I'm not sure a direct approach like this is the way to go. Every OS has its own way to do this, and often multiple ways. There are 35 operating systems listed on https://curl.se/download.html and if every one wants to add another 129 lines to trurl.c, it won't be long before trurl is more sandbox boilerplate than trurl code.

I don't know what the state of the art is in this area, but what I figure the world needs is a standard description format for what services an application needs from the OS that can be shipped with trurl. From that, a code generator could generate Landlock code, or unveil or App Armor or a systemd unit file or whatever the user wants to use to enforce restrictions. For trurl something like that straightforward approach would work, but I suspect it's a naïve idea in the general case since moving beyond simple cases probably leads into OS-specific areas very quickly. Every operating system has its own way of specifying restrictions that aren't necessarily comparable. Even your example is admittedly incomplete, since some filesystem access is actually necessary in trurl but the path needed isn't known until after the program starts.

There are enforcement systems that watch an application run and determine automatically what capabilities/permissions it needs ("audit mode"). With a system like that, you should be able to get most of the way to a working sandbox without needing any extra code.

daniel-j-h commented 4 months ago

While I'm all for sandboxing, I'm not sure a direct approach like this is the way to go. Every OS has its own way to do this, and often multiple ways. There are 35 operating systems listed on https://curl.se/download.html and if every one wants to add another 129 lines to trurl.c, it won't be long before trurl is more sandbox boilerplate than trurl code.

You are correct, the landlock mechanism I explore here is only available on Linux. There is a similar API in OpenBSD called unveil() I'm familiar with. That said sandboxing can be optional: I don't see a downside if we protect e.g. all Linux users but not the four Haiku people out there; they'll manage.

Even your example is admittedly incomplete, since some filesystem access is actually necessary in trurl but the path needed isn't known until after the program starts.

Like I said above at the moment it's simply blocking all filesystem access; both landlock as well as unveil() offer functionality to e.g. allow read-only on a specific file, so that we could support the trurl url-file command line option:

The trurl program offers an option to read urls from a file and we could allow reads from that file; but for now I simply prevent all filesystem access to that we can use this option as a test case.

What I wanted to explore here is if we can restrict the potential blast radius the vast amount of dependencies trurl brings even tho it's only parsing urls/strings by simply not allowing file system access for a start. And if that is a way forward then think about doing the same in the curl command line program (which granted would be a bigger undertaking).

bagder commented 4 months ago

Maybe an alternative take would be to create a stand-alone URL parsing library based on the libcurl code. For both curl and trurl to use...

daniel-j-h commented 4 months ago

Having a separate library both curl as well as trurl depend on could work. Don't you think it's a bigger lift, tho, and there's quite a high barrier to make it work? Would you ship then not just libcurl but also the parsing library? That would need a lot of support infrastructure, build support, and so on.

In addition I wanted to start looking into sandboxing trurl only as a first step and ideally we'd sandbox the curl binary, too, so that e.g. simple GET or POST requests don't get access to the full filesystem by default.

vszakats commented 3 months ago

Might not be exactly that, but most of the work done on implementing curl-for-win (libcurl) builds for trurl tests was to make libcurl as small as possible (also meaning as few dependencies as possible). The result was the config -zero-imap-osnotls-osnoidn-nohttp-nocurltool. Where -imap is optional, and necessary to retrieve the default imap port. Dropping it allows to disable more options to make the binary much smaller. The imap requirement could probably be fixed with some local trurl logic.

This requires a separate curl build. Implementing it inside the mainline build logic to produce a separate trurl-optimized libcurl lib is probably possible, but non-trivial.

Notice that curl-for-win also works for Linux (and macOS), and this effort was not Windows specific.

bagder commented 1 week ago

Closing because no progress in months and still many red builds.

daniel-j-h commented 1 week ago

Closing because no progress in months and still many red builds.

I'm happy with closing this but then let's agree that it's because restricting trurl using a sandbox mechanism is not something we want to do. I have made a start in this pull request for the reason to start a conversation and it looks like there is no interest to move forward from the project's side.

bagder commented 1 week ago

I have no opinion on using sandbox mechanisms. I saw a PR that has not been touched in months and still builds red.

If someone wants to offer sandbox mechanisms for the trurl build, by all means go ahead.