bodgit / sevenzip

Golang library for dealing with 7-zip archives
https://godoc.org/github.com/bodgit/sevenzip
BSD 3-Clause "New" or "Revised" License
166 stars 14 forks source link

Handle wrong passwords #38

Open abdusco opened 2 years ago

abdusco commented 2 years ago

I'm using github.com/bodgit/sevenzip v1.3.0.

When I try to extract a password protected 7z archive with a wrong password, I get newRangeDecoder: first byte not zero error.

Here's a test to replicate it (needs stretchr/testify):

package main

import (
    "encoding/base64"
    "io"
    "io/ioutil"
    "testing"

    "github.com/bodgit/sevenzip"
    "github.com/stretchr/testify/assert"
)

func TestWrongPassword(t *testing.T) {
    arcWithPass := "N3q8ryccAARCn6bIEAAAAAAAAABiAAAAAAAAAG8Zm0GDjJRj+0NwGgTAtoiHYDl5AQQGAAEJEAAHCwEAAiQG8QcBElMPMHa6n3dxaC3cnHuv+q8n7yEhAQABAAwNCQAICgEK18HKAAAFARkBABELAHQAZQB4AHQAAAAZABQKAQCWO8QQJK/YARUGAQAggKSBAAA="
    arcWOPass := "N3q8ryccAASaR9RFDQAAAAAAAABSAAAAAAAAADFkgJoBAAhzZXZlbnppcAoAAQQGAAEJDQAHCwEAASEhAQAMCQAICgEK18HKAAAFARkMAAAAAAAAAAAAAAAAEQsAdABlAHgAdAAAABkAFAoBAJY7xBAkr9gBFQYBACCApIEAAA=="
    tests := []struct {
        name          string
        archiveBase64 string
        password      string
    }{
        {
            name:          "password protected, correct password",
            archiveBase64: arcWithPass,
            password:      "password",
        },
        {
            name:          "password protected, wrong password",
            archiveBase64: arcWithPass,
            password:      "password1", // fails `newRangeDecoder: first byte not zero`
        },
        {
            name:          "password protected, wrong password",
            archiveBase64: arcWithPass,
            password:      "not 'password'", // fails `lzma: unexpected chunk type`
        },
        {
            name:          "not protected, with password",
            archiveBase64: arcWOPass,
            password:      "password",
        },
        {
            name:          "not protected, without password",
            archiveBase64: arcWOPass,
            password:      "",
        },
        {
            name:          "password protected, wrong password, another error",
            archiveBase64: "N3q8ryccAATmiIjWEAAAAAAAAABqAAAAAAAAAPj97y5WCxqWKbLj7vDaXAe/cLh2AQQGAAEJEAAHCwEAAiQG8QcBElMPRIjTr6dLKHHK30sEtwLboyEhAQABAAwMCAAICgH8UFzbAAAFARkBABETAHQAZQB4AHQALgB0AHgAdAAAABkAFAoBADbNXd0pr9gBFQYBACCApIEAAA==",
            password:      "notpassword", // fails `lzma: unsupported chunk header byte`
        },
    }
    for _, tt := range tests {
        t.Run(tt.name, func(t *testing.T) {
            archive, size := readBase64(tt.archiveBase64)
            r, err := sevenzip.NewReaderWithPassword(archive, size, tt.password)
            assert.NoError(t, err)

            for _, af := range r.File {
                f, err := af.Open()
                assert.NoError(t, err)
                _, err = io.Copy(ioutil.Discard, f)
                assert.NoError(t, err)
            }
        })
    }
}

func readBase64(encoded string) (io.ReaderAt, int64) {
    // decode as base64
    decoded, err := base64.StdEncoding.DecodeString(encoded)
    if err != nil {
        panic(err)
    }
    return bytesAt{decoded}, int64(len(decoded))
}

type bytesAt struct {
    b []byte
}

func (b bytesAt) ReadAt(p []byte, off int64) (n int, err error) {
    if off >= int64(len(b.b)) {
        return 0, io.EOF
    }
    n = copy(p, b.b[off:])
    return
}

Here the second case password protected, wrong password fails with an error. It seems that the error comes deep within io package when I try to read the archived file io.Copy(ioutil.Discard, f).

It would be great if it detected wrong passwords when I call NewReaderWithPassword and returned an ErrWrongPassword or something. This would let me catch it with errors.Is(err, ErrWrongPassword) and try another password.

I can submit a fix if you can give me some pointers on where to look.

abdusco commented 2 years ago

When reading another password protected 7z file, I get lzma: unexpected chunk type.

Test case to replicate it (added to the OP):

{
    name:          "password protected, wrong password, another error",
    archiveBase64: "N3q8ryccAATmiIjWEAAAAAAAAABqAAAAAAAAAPj97y5WCxqWKbLj7vDaXAe/cLh2AQQGAAEJEAAHCwEAAiQG8QcBElMPRIjTr6dLKHHK30sEtwLboyEhAQABAAwMCAAICgH8UFzbAAAFARkBABETAHQAZQB4AHQALgB0AHgAdAAAABkAFAoBADbNXd0pr9gBFQYBACCApIEAAA==",
    password:      "not 'password'",
}
abdusco commented 2 years ago

For context, in all cases, I create the 7z file with:

$ echo testing > text.txt
$ 7z a -ppassword password.7z text.txt
$ cat password.7z | base64 | pbcopy
bodgit commented 2 years ago

So the way the encryption works on 7-zip archives is that it performs the usual compression (with LZMA, or any other algorithm for that matter) in one layer, producing a compressed stream of bytes, then it's wrapped by the layer that does the encryption, producing the encrypted (and compressed) stream of bytes. The encryption has no idea what the underlying compression format was.

So conversely when it comes to the decryption/decompression, the decryption layer doesn't know what the correct password is as it still produces a stream of bytes even with the wrong password, it's only when the decompression layer tries to interpret it that it won't be a valid LZMA stream and produces the error but it can't easily distinguish between a corrupt unencrypted archive and a valid encrypted archive with the wrong password set.

As you can see from your test cases, setting the password to a variety of different bad values results in different errors and that's purely down to the decrypted stream being "more or less wrong" when interpreted by the underlying LZMA library.

Checking with the 7z utility, even it doesn't know 100% that the password is wrong, the error message is more a hint when the archive has encryption enabled, and that's probably the best I can do.

abdusco commented 2 years ago

Good point. That also matches my experience.

I have another extractor implementation using 7z utility, and I just interpret the hint (and exit code == 2) as password being wrong.

$ 7z x -pwrongpassword password.7z

7-Zip (z) 21.07 (arm64) : Copyright (c) 1999-2021 Igor Pavlov : 2021-12-26
 64-bit arm_v:8 locale=en_US.UTF-8 Threads:8, ASM

Scanning the drive for archives:
1 file, 154 bytes (1 KiB)

Extracting archive: password.7z
--
Path = password.7z
Type = 7z
Physical Size = 154
Headers Size = 138
Method = LZMA2:12 7zAES
Solid = -
Blocks = 1

ERROR: Data Error in encrypted file. Wrong password? : text.txt

Sub items Errors: 1
Archives with Errors: 1
Sub items Errors: 1

I can do the same thing for this package, but there seems to be a lot of noise (different types of errors) that could be misinterpreted as password being wrong. If there were a way to know that an archive can't be decrypted, I would be happy with that too.

bodgit commented 2 years ago

Please try this branch and see if that helps: https://github.com/bodgit/sevenzip/tree/wrong-password

It introduces a new PasswordError type that will be returned if the archive uses some sort of encryption method but the archive headers cannot be read. It wraps the real error that occurred, which will be whatever is emitted by the decompression libraries.

You can assert that the returned error includes this type using errors.As.

Like with the 7z utility, this error is a hint at best. There can be situations where if the archive is encrypted but corrupt, with the correct password you could still get an error saying the password is wrong if the corruption stops the archive headers from being read.

abdusco commented 2 years ago

Thanks!

I couldn't get it to work. I'm still getting the lzma: unexpected chunk type error.

// go.mod
github.com/bodgit/sevenzip v1.3.1-0.20220814220638-421188bed950
func TestSevenzipPasswordError(t *testing.T) {
    passwordDot7z := `N3q8ryccAATLw8/pEAAAAAAAAABqAAAAAAAAAL5k3wUq4RlmRrx/qDeBDV2MzerKAQQGAAEJEAAHCwEAAiQG8QcBElMPbBOTnKWOQVr5GQ8zDSlIuSEhAQABAAwMCAAICgH8UFzbAAAFARkBABETAHQAZQB4AHQALgB0AHgAdAAAABkAFAoBAMcsnkBbsNgBFQYBACCApIEAAA==`
    tests := []struct {
        name          string
        archiveBase64 string
        wantErr       bool
        password      string
    }{
        {
            name:          "correct pass",
            archiveBase64: passwordDot7z,
            password:      "password",
            wantErr:       false,
        },
        {
            name:          "wrong pass",
            archiveBase64: passwordDot7z,
            password:      "notpassword",
            wantErr:       true, // fails with lzma: unexpected chunk type
        },
    }
    for _, tt := range tests {
        t.Run(tt.name, func(t *testing.T) {
            f, size := readBase64(tt.archiveBase64)

            ar, err := sevenzip.NewReaderWithPassword(f, size, tt.password)
            assert.NoError(t, err) // should have failed here!!!!!
            for _, it := range ar.File {
                af, err := it.Open()
                assert.NoError(t, err)
                _, err = io.Copy(ioutil.Discard, af)
                if tt.wantErr {
                    var pErr *sevenzip.PasswordError
                    assert.ErrorAs(t, err, &pErr)
                } else {
                    assert.NoError(t, err)
                }
            }
        })
    }
}

func readBase64(encoded string) (io.ReaderAt, int64) {
    decoded, err := base64.StdEncoding.DecodeString(encoded)
    if err != nil {
        panic(err)
    }
    return bytesAt{decoded}, int64(len(decoded))
}

type bytesAt struct {
    b []byte
}

func (b bytesAt) ReadAt(p []byte, off int64) (n int, err error) {
    if off >= int64(len(b.b)) {
        return 0, io.EOF
    }
    n = copy(p, b.b[off:])
    return
}

this outputs:

=== RUN   TestSevenzipPasswordError
--- FAIL: TestSevenzipPasswordError (0.05s)
=== RUN   TestSevenzipPasswordError/correct_pass
    --- PASS: TestSevenzipPasswordError/correct_pass (0.03s)
=== RUN   TestSevenzipPasswordError/wrong_pass
    s7zipgo_test.go:109: 
            Error Trace:    /path/to/s7zipgo_test.go:109
            Error:          Should be in error chain:
                            expected: %!q(**sevenzip.PasswordError=0x14000010048)
                            in chain: "lzma: unexpected chunk type"
            Test:           TestSevenzipPasswordError/wrong_pass
    --- FAIL: TestSevenzipPasswordError/wrong_pass (0.03s)
abdusco commented 2 years ago

I tried it again with fixtures t2.7z and t3.7z in the repo, they fail at the right place. https://github.com/bodgit/sevenzip/blob/421188bed95035f7aeb78e6ab1ab167dd7de4312/reader_test.go#L135

ar, err := sevenzip.NewReaderWithPassword(f, size, tt.password)
// err is a PasswordError

The files I'm using don't fail while opening the archive, but when reading a file off of them.


For reference, version of 7z utility I'm using:

> 7z

7-Zip (z) 21.07 (arm64) : Copyright (c) 1999-2021 Igor Pavlov : 2021-12-26
 64-bit arm_v:8 locale=UTF-8 Threads:8, ASM

> brew info sevenzip
sevenzip: stable 22.01 (bottled)
7-Zip is a file archiver with a high compression ratio
https://7-zip.org
/opt/homebrew/Cellar/sevenzip/21.07 (3 files, 1.9MB) *
  Poured from bottle on 2022-04-11 at 21:11:57
From: https://github.com/Homebrew/homebrew-core/blob/HEAD/Formula/sevenzip.rb
License: LGPL-2.1-or-later and BSD-3-Clause
==> Analytics
install: 5,285 (30 days), 12,444 (90 days), 25,508 (365 days)
install-on-request: 5,283 (30 days), 12,436 (90 days), 25,497 (365 days)
build-error: 7 (30 days)
bodgit commented 2 years ago

So I think you've managed to create a 7-zip archive that doesn't encrypt the headers, (i.e. what files are in the archive and how big they are, etc.), but the files themselves are encrypted. This means you'll never know if the password is wrong until you try and extract a file, there's no way around that unfortunately.

If you create your archive with -mhe=on added then it encrypts the headers as well, that's how the t2.7z & t3.7z were created.

abdusco commented 2 years ago

(i.e. what files are in the archive and how big they are, etc.)

Yeah, it found it bizarre that I could read file names.

I understand the difference now. I was only adding -p$pass flag to 7z command. I can confirm that adding -mhe flag does give the error.

I guess I can chalk up any error starting with lzma: and newRangeDecoder: to decryption failure, because they all come from lzma library.

bodgit commented 2 years ago

Not necessarily, it depends what compression algorithm is used. LZMA is the default algorithm so that's the most common but you can use other algorithms such as bzip2, deflate, etc. and then there are other ones that are really filters such as delta and BCJ2 which aim to rearrange the file contents in a way that makes them compress better, they can also be stacked/wrapped. In Golang terminology, think of it as a chain of one or more io.Readers.

I've added support for most of the common algorithms where there's either support in the standard library or an external library and at the moment my code just bubbles up whatever error the underlying library emits in the case of an error.

It might be an idea to wrap these in a generic ReadError type which can optionally be wrapped with that PasswordError in the event there's some encryption involved but it will involve a lot more changes than those I added for the case of the headers being encrypted.

davidnewhall commented 1 year ago

Can you merge master into this branch? I'm interested in having better password-error identification. My app supports 'trying' multiple passwords, and right now I just try the next password on any error. If the archive is not encrypted, it'd be nice to just not try anymore passwords.