golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.94k stars 17.66k forks source link

proposal: archive/tar, archive/zip: add NewReaderOptions with directory traversal defenses #57850

Open neild opened 1 year ago

neild commented 1 year ago

This is another attempt at hardening the archive/tar and archive/zip packages against file traversal attacks, fixing #25849.

Naive handling of archives containing filenames such as ../../../etc/passwd leads to security vulnerabilities. (See #55356 for a bunch of reference CVEs.) In #55356, I proposed changing archive/tar and archive/zip to return an ErrInsecurePath error when encountering an unsafe filename. Unfortunately, we subsequently discovered that Docker images frequently include filenames with absolute paths, making this change infeasible.

I propose adding new safe-by-default functions to archive/tar and archive/zip.

package tar

type ReaderOptions struct {
    AllowAbsolutePaths          bool
    AllowRelativePathComponents bool
}

// NewReaderWithOptions creates a new reader reading from r.
//
// The Reader's Next method will return ErrInsecurePath and a valid
// *Header if the next file's name is:
//
//   - absolute and opts.AllowAbsolutePaths is not set;
//   - contains a ".." path component and opts.AllowRelativePathComponents is not set; or
//   - on Windows, a reserved file name such as "NUL".
func NewReaderWithOptions(r io.Reader, opts ReaderOptions) *Reader
package zip

type ReaderOptions struct {
    AllowAbsolutePaths          bool
    AllowRelativePathComponents bool
}

// NewReaderWithOptions returns a new Reader reading from r,
// which is assumed to have the given size in bytes.
//
// 
// NewReaderWithOptions will return ErrInsecurePath and a valid
// *Reader if the archive contains file names that are:
//
//   - absolute and opts.AllowAbsolutePaths is not set;
//   - contains a ".." path component and opts.AllowRelativePathComponents is not set;
//   - contain a backslash (\) character; or
//   - on Windows, a reserved file name such as "NUL".
func NewReaderWithOptions(r io.ReaderAt, size int64, opts ReaderOptions) (*Reader, error)

The NewReaderWithOptions functions will provide the same functionality as proposed in #55356, but in a new API to avoid breaking existing users. In addition, these functions provide an easy way for users who want to accept some forms of unsafe path to do so while still defending against unexpected cases: For example, permitting absolute filenames while rejecting ones containing unexpected backslash characters.

I further propose that we update the NewReader documentation to encourage all users to migrate to NewReaderWithOptions. Once all supported Go versions include NewReaderWithOptions, we can go further and deprecate NewReader. We might also consider changing NewReader to be safe by default at some point, although that decision is out of scope for this proposal.

ianlancetaylor commented 1 year ago

Bikeshed: instead of NewReaderWithOptions, call the function Open. This corresponds to the function os.Open which opens a file for read-only access.

rsc commented 1 year ago

What happened with the errors being introduced in Go 1.20? I thought the plan was to turn them on by default in some newer release?

rsc commented 1 year ago

This proposal has been added to the active column of the proposals project and will now be reviewed at the weekly proposal review meetings. — rsc for the proposal review group

neild commented 1 year ago

We stated that we might turn on name security checks, but the pervasiveness of Docker tar files with absolute paths makes me dubious that we will ever be able to do so without causing widespread breakage.

Adding a new safe-by-default function could give us a path to making that change: First we add the safe function under a different name, then we deprecate the unsafe functions, and when a sufficient portion of the ecosystem has moved off of the unsafe functions we might be able to change them to be safe by default and even undeprecate them.

Open is an interesting choice of name. Unfortunately, archive/zip has an OpenReader function which opens a file rather than an io.ReaderAt; an Open function would be confusing there. I overlooked OpenReader when writing this proposal; we should also have a safe variant of it as well, such as OpenReaderWithOptions.

rsc commented 1 year ago

This still feels like it's getting very complicated. One way it can be hard to write secure code is when there are too many ways to do it, and it feels like we're headed down that road. Can we simplify this?

How many different programs or libraries read "Docker tar files"? If those update to recognize the new errors, that problem is fixed, right?

dsnet commented 1 year ago

I would expect NewReader and NewReaderWithOptions(ReaderOptions{}) to be the same, but that's not the case since the former is insecure, while the latter is secure.

Adding a new safe-by-default function...

Safe-by-default, but also one that's just much harder to call than NewReader. Even if we rename NewReaderWithOptions to just Open, it's still harder to call without #12854. That's quite unfortunate. I'm personally in favor of changing NewReader and directing callers depending on absolute paths to ignore ErrInsecurePath.

Also, I personally find Open a confusing name for this since it doesn't automatically open a file from the FS.

neild commented 1 year ago

This still feels like it's getting very complicated. One way it can be hard to write secure code is when there are too many ways to do it, and it feels like we're headed down that road. Can we simplify this?

How many different programs or libraries read "Docker tar files"? If those update to recognize the new errors, that problem is fixed, right?

We could simplify this by adding a safe-by-default API with no options, or with fewer options. For example:

package tar
func SafeReader(r io.Reader) *Reader

The alternative is to change the existing readers as proposed in #55356. I don't know how many programs would need to be updated to handle this change. Docker is pretty popular, though. I'll try running Google-internal tests (which include a fair bit of third-party package coverage) with GODEBUG=tarinsecurepath=0,zipinsecurepath=0 and report back on how many things break. (We did this in the runup to 1.20, but I don't have the failure list to hand any more.)

loveyandex commented 1 year ago

hi hi

rsc commented 1 year ago

We could add NewSafeReader, but I don't think we've exhausted trying #55356 (default on) yet. We should probably do that first or else understand why we can't ever do that. The Google test failures looked like they had a small number of root causes. Docker may be popular but I don't think direct reading of Docker tar files is nearly as popular.

csmith commented 1 year ago

This is a noble goal but it feels as though the proposal is trying to fix the issue in the wrong place. A path in an archive isn't inherently secure or insecure, so why is the reader concerning itself with making that decision and returning an error? It feels a little like trying to prevent SQL injection by making http.Request.PostFormValue() aware of 'dangerous' SQL characters.

Would it make more sense to add API to handle extracting an archive to a filesystem directly? That would remove a lot of the "boilerplate" you need to write at present, and provide a sensible place for the "secure or not" determination to live where it can make OS-specific decisions.

irsl commented 1 year ago

subtopic#1: Wrt docker tarballs, another option could be sanitizing the filename in the header (turning absolute into relative), when on GODEBUG env var or a ReaderOptions option indicate it. When entries are extracted to disk, integrations are supposed to call path.Join anyway, so this should be transparent. (Btw, docker calls chroot before extracting anything)

subtopic#2: Another attack vector currently not covered is via symlinks. Consider a crafted tar archive with the following two entries:

$ tar tvf crafted.tar
lrwxrwxrwx imrer/primarygroup 0 2023-02-23 08:17 something -> /
-rw-r----- imrer/primarygroup 10 2023-02-23 08:18 something/root/.bashrc

I think legitimate archives with symlink entries pointing out of the archive (but without a path traversal follow up entry) are more frequent out there than archives with absolute path entries. Doing this securely would require one of these (all of them have some drawbacks):

subtopic#3: I tried finding info about when support for variadic functions was added into golang, but didn't find anything authoritative. If it was supported from the very beginnings, the existing entry points could be expanded with something like this:


type ArchiveOptions int

const (
    RejectRelativePathComponents ArchiveOptions = 0
    AllowRelativePathComponents  ArchiveOptions = 1
        ...
)

type Reader struct{}

func NewReader(r io.ReaderAt, size int64, options ...ArchiveOptions) (*Reader, error) {
  if len(options) == 0 {
     // adjust the default (e.g. activate the Reject options by default eventually)
  }
...
}

This would keep the number of entrypoints the same and may make things a bit less complicated (while still offering full control).

neild commented 1 year ago

This is a noble goal but it feels as though the proposal is trying to fix the issue in the wrong place. A path in an archive isn't inherently secure or insecure, so why is the reader concerning itself with making that decision and returning an error?

As a practical concern, validating that archive paths don't escape the current directory would have avoided at least five distinct CVEs that I know of, and probably others that I don't. (See #55356.) Addressing the issue in the archive reader will reduce the chances of vulnerabilities in all code that reads archives; adding a new API to extract an archive to a filesystem risks satisfying only a subset of possible use cases and leaving other users vulnerable.

func NewReader(r io.ReaderAt, size int64, options ...ArchiveOptions) (*Reader, error) {

This is unfortunately not a backwards compatible change, since existing code may rely on the current function signature. For example, code might assign NewReader to a variable.

The Google test failures looked like they had a small number of root causes.

I'll report back when I manage to get a test run to go through. (The number of failures caused timeouts in the failure-deflaking tool. Might still be a small number of root causes, though.)

rsc commented 1 year ago

Waiting for more information about whether we can enable #55356 by default.

rsc commented 1 year ago

Putting on hold until we have that more information. Please move back to active when we do.

rsc commented 1 year ago

Placed on hold. — rsc for the proposal review group