Lonami / lzxd

https://crates.io/crates/lzxd
Apache License 2.0
14 stars 3 forks source link

Corrupted output #25

Closed ikrivosheev closed 10 months ago

ikrivosheev commented 10 months ago

I have a cab archive: python.zip

I extracted this file and have log:

gcab: gcab.txt

rust-cab ptesc.txt

Here is different:

907c907
< 9eecdeb542613c96ef9d822c754677fad20cdc6b01f998438f9143981c42d6b1  _ssl.pyd
---
> c93007f787f06f4a3c187b12a03bccc9e8e27b1e5cc71b4f44ddc2ef045870c8  _ssl.pyd
2106c2106
< 5c4f7eb850cb4ebd35c039be7319e2ed05439418884d414001e015c4637585fc  python27.dll
---
> 3fdca19920531643ca7cbfb01df73b6b4245da4024b264c3737cb38d3c439571  python27.dll
2302c2302
< d92c119edcb239fc52cdb1b59eddc19f251ade3a55b519d144c494b3581fc607  tcl85.dll
---
> be3703458dbb3f4308f4cf1fcf6d3c89e6cc77d2439a16d52c11ca26fc55364f  tcl85.dll
3045c3045
< 751941b4e09898c31791efeb5f90fc7367c89831d4a98637ed505e40763e287b  wininst_6.0.exe
---
> 854f0c6807c74bbf3249be772a2ab04a3934b71466d5868e2a0ee5c18b3911e4  wininst_6.0.exe
3048c3048
< 52def964142be6891054d2f95256a3b05d66887964fcd66b34abfe32477e8965  wininst_9.0.exe
---
> e64f29cb9e193c14e6904516e2a8829d3674928c22a321ed813ca6060a596492  wininst_9.0.exe
ikrivosheev commented 10 months ago

Well, my research led me to this code:

if offset <= self.pos && length <= offset && self.pos + length < self.buffer.len() {
    // Best case: neither source or destination wrap around
    // TODO write a test for this because it used to fail
    let start = self.pos - offset;
    self.buffer.copy_within(start..start + length, self.pos);

@Lonami can you explain your TODO?

Lonami commented 10 months ago

I don't really remember. But if you git blame the TODO you find commit https://github.com/Lonami/lzxd/commit/b9766e0d1cc1fc5611db543d937cd382823626dc.

So I guess that had something to do with it.

ikrivosheev commented 10 months ago

@Lonami well, continue research)

ikrivosheev commented 10 months ago

Well, I found bug.

Specification says:

Uncompressed Block

R0 - Least significant to most significant byte (little-endian DWORD ([MS-DTYP]))

The key word is: DWORD

And code in the library: https://github.com/Lonami/lzxd/blob/master/src/bitstream.rs#L147

pub fn read_u32_le(&mut self) -> Result<u32, DecodeFailed> {
    let lo = self.read_u16_le()? as u32;
    let hi = self.read_u16_le()? as u32;
    Ok((hi << 16) | lo)
}

We read as two WORDS. But we must read as DWORD. Well, I will prepare patch. And how it works in other library

  1. gcab: https://github.com/GNOME/gcab/blob/master/libgcab/decomp.c#L913
  2. https://github.com/LeonBlade/xnbcli/blob/master/app/BufferReader.js#L254
ikrivosheev commented 10 months ago

@Lonami I opened PR.

ikrivosheev commented 10 months ago

After my tests - this is the last bug!

ikrivosheev commented 10 months ago

Done

ikrivosheev commented 10 months ago

@Lonami, please, publish patch release)

Lonami commented 10 months ago

v0.2.4 should now be on crates.io.