Open npajkovsky opened 1 year ago
Found the problem, I'll submit MR tomorrow.
I am running into a similar error while trying to read tar.gz archives generated by Cargo (.crate file), however, it seems to occur randomly, and I haven't been able to pin down what conditions trigger it.
here is a simplified code example to explain:
#[test]
fn check_extract_cargo_dot_toml_from_krate_file() {
const DATA: &[u8] = include_bytes!("path/to/raw-cargo-publish-http-request-body");
let krate_file: &[u8] = // code to split .crate file from json metadata in request body
let cargo_dot_toml = extract_cargo_dot_toml(krate_file).unwrap();
// ..
}
// simplified version to demonstrate usage of tar crate api
fn extract_cargo_dot_toml(krate_file: &[u8]) -> Result<_, _> {
let decoder = flate2::read::GzDecoder::new(krate_file);
let mut archive = tar::Archive::new(decoder);
for entry in archive.entries()? {
let mut entry = entry?;
let path = entry.path()?;
if path.ends_with("Cargo.toml.orig") {
// store .read_to_string(..) output in `manifest`
} else if path.ends_with(".cargo_vcs_info.json") {
// store .read_to_string(..) output in `vcs_info`
}
if manifest.is_some() && vcs_info.is_some() {
break
}
}
// ..
}
The vast majority of the time, this code works fine. Once in a while, I get
Err(Custom {
kind: Other,
error: "numeric field was not a number: when getting cksum for <path/to>/Cargo.toml.orig",
})
Strangely, it seems to correlate with running the tests with a slightly different set of command line flags, i.e. cargo test
vs cargo test --lib
. It also seems more likely in the several invocations following a full rebuild.
I have tried using a fuzzer to produce a self-contained example, but all that produced so far is an example that always fails, including using the tar binary and the code in the PR discussed here (it does fail with a similar error).
I would love to be able to identify the cause of this. Can you think of any theory for why the error would show up sporadically this way?
Update: following the code in src/header.rs
, I logged the raw byte slice which ended up (randomly) triggering the error:
Err(
Custom {
kind: Other,
error: "numeric field was not a number: , (input slice = [0, 0, 49, 51, 49, 51, 51, 0]) when getting cksum for <path/to>/Cargo.toml.orig",
},
)
relevant code in src/header.rs
:
impl Header {
pub fn cksum(&self) -> io::Result<u32> {
octal_from(&self.as_old().cksum)
.map(|u| u as u32)
.map_err(|err| {
io::Error::new(
err.kind(),
format!("{} when getting cksum for {}", err, self.path_lossy()),
)
})
}
pub fn as_old(&self) -> &OldHeader {
unsafe { cast(self) }
}
// ..
}
fn octal_from(slice: &[u8]) -> io::Result<u64> {
let trun = truncate(slice);
let num = match str::from_utf8(trun) {
Ok(n) => n,
Err(_) => {
return Err(other(&format!(
"numeric field did not have utf-8 text: {}",
String::from_utf8_lossy(trun)
)));
}
};
match u64::from_str_radix(num.trim(), 8) {
Ok(n) => Ok(n),
Err(_) => Err(other(&format!("numeric field was not a number: {}, (input slice = {:?})", num, slice))),
}
}
I know nothing about the tar format, so not sure what to make of this...
$ evcxr
>> let input: &[u8] = &[0, 0, 49, 51, 49, 51, 51, 0][..];
>> std::str::from_utf8(input)
Ok("\0\013133\0")
>> std::str::from_utf8(input).unwrap().trim()
"\0\013133\0"
>> u64::from_str_radix(std::str::from_utf8(input).unwrap().trim(), 8)
Err(ParseIntError { kind: InvalidDigit })
>> u64::from_str_radix("13133", 8)
Ok(5723)
Update 2: I only get the error when zlib-ng
(or zlib-ng-compat
) feature is enabled in flate2
crate. It seems to be a flate2 with zlib-ng
enabled issue rather than tar
.
Hello,
I have a tarball that I want to read
But I've got an error
It looks like that it starts reading a header from a wrong position.
My code looks like this
And with a couple of dbg!() I've got this.
Any ideas what could go wrong?