alexcrichton / tar-rs

Tar file reading/writing for Rust
https://docs.rs/tar
Apache License 2.0
627 stars 190 forks source link

error: "numeric field was not a number: t Regard when getting cksum for ...." #313

Open npajkovsky opened 1 year ago

npajkovsky commented 1 year ago

Hello,

I have a tarball that I want to read

-> % file backup.tar
backup.tar: POSIX tar archive

But I've got an error

error: "numeric field was not a number: t Regard when getting cksum for a, <=\nm>\n> wrote:\n>\n> Dear \n>\n>\n>\n> Correct. We will have more detai"

It looks like that it starts reading a header from a wrong position.

My code looks like this

use std::fs::File;

use eyre::Result;
use tar::Archive;

fn main() -> Result<()> {
    let file: File = File::open(path: "backup.tar")?;
    let mut archive: Archive<File> = Archive::new(obj: file);

    archive.entries().unwrap().for_each(|x: Result<Entry<File>, Error>| {
        let x: Entry<File> = x.unwrap();
        println!("{}", x.path().unwrap().display())
    });

    Ok(())
}

And with a couple of dbg!() I've got this.

[/home/nikola/src/tar-rs/src/archive.rs:268] self.next = 1081462272
[/home/nikola/src/tar-rs/src/archive.rs:271] self.next - self.archive.inner.pos.get() = 7680
[/home/nikola/src/tar-rs/src/archive.rs:294] &header = UstarHeader {
    entry_size: 138,
    size: 138,
    path: "././@LongLink",
    link_name: None,
    username: Some(
        "",
    ),
    groupname: Some(
        "",
    ),
    cksum: 2567,
    cksum_valid: true,
}
[/home/nikola/src/tar-rs/src/archive.rs:268] self.next = 1081463296
[/home/nikola/src/tar-rs/src/archive.rs:271] self.next - self.archive.inner.pos.get() = 374
[/home/nikola/src/tar-rs/src/archive.rs:294] &header = UstarHeader {
    entry_size: 64700,
    size: 64700,
    path: "backup-3.22.2023_17-45-34_acaegypt/homedir/mail/....t/management/cur/1673424346.M756912P292403.demosthenes.datapack.net,S=64700,W=65570:2,",
    link_name: Some(
        "backup-3.22.2023_17-45-34_acaegypt/homedir/mail/...t/acc/cur/1673424346.M756912P292403.demos",
    ),
    mode: 0o640,
    uid: 1058,
    gid: 1061,
    mtime: 1673424346,
    username: Some(
        "user",
    ),
    groupname: Some(
        "group",
    ),
    device_major: Some(
        9,
    ),
    device_minor: Some(
        2,
    ),
    cksum: 24700,
    cksum_valid: true,
}
backup-3.22.2023_17-45-34_example/homedir/mail/example.net/management/cur/1673424346.M756912P292403.....net,S=64700,W=65570:2,
[/home/nikola/src/tar-rs/src/archive.rs:268] self.next = 1081528832
[/home/nikola/src/tar-rs/src/archive.rs:271] self.next - self.archive.inner.pos.get() = 65024
[/home/nikola/src/tar-rs/src/archive.rs:294] &header = OldHeader {
    path: "a, <=\nm>\n> wrote:\n>\n> Dear Eslam,\n>\n>\n>\n> Correct. We will have more detai",
    link_name: Some(
        ",\n>\n> *\n>\n>\n>\n> Tel: +380 (48) 7 340 34",
    ),
    username: None,
    groupname: None,
    device_major: None,
    device_minor: None,
}

Any ideas what could go wrong?

npajkovsky commented 1 year ago

Found the problem, I'll submit MR tomorrow.

jonathanstrong commented 1 year ago

I am running into a similar error while trying to read tar.gz archives generated by Cargo (.crate file), however, it seems to occur randomly, and I haven't been able to pin down what conditions trigger it.

here is a simplified code example to explain:

#[test]
fn check_extract_cargo_dot_toml_from_krate_file() {
    const DATA: &[u8] = include_bytes!("path/to/raw-cargo-publish-http-request-body");
    let krate_file: &[u8] = // code to split .crate file from json metadata in request body
    let cargo_dot_toml = extract_cargo_dot_toml(krate_file).unwrap();
    // ..
}

// simplified version to demonstrate usage of tar crate api
fn extract_cargo_dot_toml(krate_file: &[u8]) -> Result<_, _> {
    let decoder = flate2::read::GzDecoder::new(krate_file);
    let mut archive = tar::Archive::new(decoder);

    for entry in archive.entries()? {
        let mut entry = entry?;
        let path = entry.path()?;

        if path.ends_with("Cargo.toml.orig") {
            // store .read_to_string(..) output in `manifest`
        } else if path.ends_with(".cargo_vcs_info.json") {
            // store .read_to_string(..) output in `vcs_info`
        }

        if manifest.is_some() && vcs_info.is_some() {
            break
        }
    }
    // ..
}

The vast majority of the time, this code works fine. Once in a while, I get

Err(Custom {
    kind: Other,
    error: "numeric field was not a number:  when getting cksum for <path/to>/Cargo.toml.orig",
})

Strangely, it seems to correlate with running the tests with a slightly different set of command line flags, i.e. cargo test vs cargo test --lib. It also seems more likely in the several invocations following a full rebuild.

I have tried using a fuzzer to produce a self-contained example, but all that produced so far is an example that always fails, including using the tar binary and the code in the PR discussed here (it does fail with a similar error).

I would love to be able to identify the cause of this. Can you think of any theory for why the error would show up sporadically this way?

Update: following the code in src/header.rs, I logged the raw byte slice which ended up (randomly) triggering the error:

Err(
    Custom {
        kind: Other,
        error: "numeric field was not a number: , (input slice = [0, 0, 49, 51, 49, 51, 51, 0]) when getting cksum for <path/to>/Cargo.toml.orig",
    },
)

relevant code in src/header.rs:

impl Header {
    pub fn cksum(&self) -> io::Result<u32> {
        octal_from(&self.as_old().cksum)
            .map(|u| u as u32)
            .map_err(|err| {
                io::Error::new(
                    err.kind(),
                    format!("{} when getting cksum for {}", err, self.path_lossy()),
                )
            })
    }

    pub fn as_old(&self) -> &OldHeader {
        unsafe { cast(self) }
    }

    // ..
}

fn octal_from(slice: &[u8]) -> io::Result<u64> {
    let trun = truncate(slice);
    let num = match str::from_utf8(trun) {
        Ok(n) => n,
        Err(_) => {
            return Err(other(&format!(
                "numeric field did not have utf-8 text: {}",
                String::from_utf8_lossy(trun)
            )));
        }
    };
    match u64::from_str_radix(num.trim(), 8) {
        Ok(n) => Ok(n),
        Err(_) => Err(other(&format!("numeric field was not a number: {}, (input slice = {:?})", num, slice))),
    }
}

I know nothing about the tar format, so not sure what to make of this...

$ evcxr
>> let input: &[u8] = &[0, 0, 49, 51, 49, 51, 51, 0][..];
>> std::str::from_utf8(input)
Ok("\0\013133\0")
>> std::str::from_utf8(input).unwrap().trim()
"\0\013133\0"
>> u64::from_str_radix(std::str::from_utf8(input).unwrap().trim(), 8)
Err(ParseIntError { kind: InvalidDigit })
>> u64::from_str_radix("13133", 8)
Ok(5723)

Update 2: I only get the error when zlib-ng (or zlib-ng-compat) feature is enabled in flate2 crate. It seems to be a flate2 with zlib-ng enabled issue rather than tar.