dluxhu / perl-parallel-forkmanager

Parallel::ForkManager
20 stars 11 forks source link

PerlIO::gzip and Parallel::ForkManager do not play nice together #11

Closed Pascal666 closed 8 years ago

Pascal666 commented 8 years ago

Please note that file reads are always done in the main thread below. Although other threads are created, nothing is actually done in them.

I've tried the below code on a couple different Linux boxes. They all seem to get data corruption reading the index. The below works fine if you un-rem the 'next' (disabling Parallel::ForkManager) or gunzip the index beforehand and remove the ':gzip' (disabling PerlIO::gzip). Number of concurrent threads does not appear to matter. The corruption appears to always start at about the same line number for each index, but at different line numbers for different indexes. Running the same thing multiple times will sometimes yield the same exact corruption and sometimes not. A couple indexes (100K each) you can test with: 2015-27 2015-48

#!/usr/bin/perl

use strict;
use warnings;

use Parallel::ForkManager;
use PerlIO::gzip;

my $pm = Parallel::ForkManager->new(12);

open(IN, '<:gzip', 'wat.paths.gz') or die "can't open index";
while (my $file = <IN>) {
    print length($file) . ":$file\n" if length($file) > 142;
#next;
    next if $pm->start;
    $pm->finish;
}
close IN;
$pm->wait_all_children;
Pascal666 commented 8 years ago

Matching PerlIO::gzip bug: https://rt.cpan.org/Public/Bug/Display.html?id=114557

eserte commented 8 years ago

I could reproduce the problem with a single fork(), without Parallel::ForkManager involved at all. So it seems to be an issue PerlIO::gzip or zlib.

yanick commented 8 years ago

Not a P::FM problem per se, but I'll add a note in the module's troubleshooting section. Thanks for the head's up!