gitcnd / IO-Uncompress-Untar

This module provides a minimal pure-Perl interface that allows the reading of tar files/buffers.
GNU General Public License v3.0
0 stars 2 forks source link

When the tar.gz file has multiple files in it, there second file will have incorrect data at the beginning. #1

Open squareplanetdesign opened 5 years ago

squareplanetdesign commented 5 years ago

I created an archive with several text files in it.

When I untar/uncompressed with this object the first file written to disk had the file I expected. But the second one had a string of leading null bytes.

After looking at the implementation, I realized that the read() method was not cleaning out the raw buffer at the end of each stream. It turns out that that tar add trailing \0 padding to fill up the record. So when the last block of a specific file is read, you need to dump the rest of the buffer.

I have a fix for this:

sub read {
    my $this = shift;
    my $bytes = $_[1] || 512 * 1600;
    ++$this->{rec};    # debugging - block accidental recursion

    #warn "$this $bytes r=" . $this->{rec};
    my $offset = $_[2];
    my $at_end = 0;
    die "non zero offset not implimented" if ($offset);
    my $maxleft = $this->{header}->{size} - $this->{i};
    if ( $bytes > $maxleft ) {
        $bytes = $maxleft;
        $at_end = 1;
    }
    if ( ( !defined $this->{raw} ) || ( $bytes > length( $this->{raw} ) ) ) {
        my $blks = int( ( $bytes - length( $this->{raw} ) - 1 ) / 512 ) + 1;
        $this->{raw} .= $this->{ts}->ReadBlocks($blks) if ( $this->{rec} < 2 );
        warn "Blocked recursion $this->{rec}" if ( $this->{rec} > 1 );
        $this->{i}   += $blks * 512;
        $this->{loc} += $blks * 512;
    }
    --$this->{rec};
    $_[0] = substr( $this->{raw}, $this->{readoffset}, $bytes );
    if ($at_end) {
        $this->{raw} = '';
    } else {
        $this->{raw} = substr( $this->{raw}, $bytes );
    }

    #$this->{readoffset}+=$bytes;
    #warn "$this got=" . length($_[0]);
    return length( $_[0] );
}    # read

I will be submitting a pull request with this fix in a few minutes.