Open p5pRT opened 19 years ago
This is definitely platform dependent - I can reproduce this on x86 Linux\, but not x86 FreeBSD or OS X.
fork and __DATA__ don't mix. I don't think that we document this anywhere. I don't know if we should\, given the amount of documentation\, but it is a subtle gotcha\, and it had never occurred to me.
The basic problem is with modules that lazily read from the DATA file handle\, particularly if they read from another package's DATA file handle lazily\, and on demand. DATA is implemented by the Perl 5 compiler leaving the program's file handle open if it encounters the __DATA__ token. All is fine and dandy\, until you fork. At which point both processes now have a (buffered) DATA file handle pointing to the same kernel file descriptor. When one reads from DATA\, the other's DATA handle moves. Underneath it.
SelfLoader breaks.
$ cat Demo.pm #!perl -w
package Demo; sub import {}; use SelfLoader; @ISA = 'SelfLoader';
1; __DATA__ sub hash { print "pig-pen: $_[0]\n"; }
$ cat demo.pl #!perl -w use strict;
use Demo;
my $pid = fork(); sleep 2 if $pid; Demo::hash ($pid);
$ perl demo.pl pig-pen: 0 Undefined subroutine Demo::hash at demo.pl line 8
I'm not sure if/how we can fix SelfLoader. I'm not sure where we should document this gotcha with DATA.
Nicholas Clark
I do not think this effect is documented anywhere:
perl -wle 'open F\, q(\<)\, shift or die; defined fork or die;
open O\, q(>x).$$; print O $_ while \
Two files x$$ are created (one per process). One of the them is empty. (Both on Solaris and OS/2; but this may be system-dependent.)
Obviously\, forked process share the same position in file. One of the culprits is the DATA file handle (see bug [perl #37119]): SelfLoader will work in only one of the processes (unless it already read the \ section).
[Sounds familiar; see http://groups.google.com/group/comp.lang.perl.modules/msg/ffafaf280e1423d7 ]
It should be considered a bug in implementation of \ handle. It must tell() initially and after each read\, and seek() back before each read if any fork() was performed (drat\, there is still a race condition; maybe it should dup() first...).
Yours\, Ilya
P.S. How to reproduce: e.g.\, start
perl -dwe0
in an XTerm (with ReadLine::Perl)\, then type
fork
Now type something to the command line in one terminal and press BackSpace key. After this BackSpace won't work in other terminal.
The RT System itself - Status changed from 'new' to 'open'
On Sat\, Oct 01\, 2005 at 02:33:19PM -0700\, Ilya Zakharevich wrote:
I do not think this effect is documented anywhere:
perl -wle 'open F\, q(\<)\, shift or die; defined fork or die; open O\, q(>x).$$; print O $_ while \
' .article Two files x$$ are created (one per process). One of the them is empty. (Both on Solaris and OS/2; but this may be system-dependent.)
Obviously\, forked process share the same position in file. One of the culprits is the DATA file handle (see bug [perl #37119]): SelfLoader will work in only one of the processes (unless it already read the \ section).
Below I made a sample implementation of protection against this misfeature of fork() (and \*DATA). It is done on the level of SelfLoader; however\, the "correct" fix should happen on the level of \*DATA. Everybody: do you have any idea how to do something similar on the level of \*DATA?
(The particular case of SelfLoader is simpler since data is read in one chunk\, thus fork() can't happen between two read()s.)
Thanks\, Ilya
Ilya Zakharevich wrote:
Below I made a sample implementation of protection against this misfeature of fork() (and \*DATA). It is done on the level of SelfLoader; however\, the "correct" fix should happen on the level of \*DATA. Everybody: do you have any idea how to do something similar on the level of \*DATA?
(The particular case of SelfLoader is simpler since data is read in one chunk\, thus fork() can't happen between two read()s.)
The problem with your patch is that the DATA filehandle can't be used anymore after loading stubs\, because you just closed it. And thus the 19th test of lib/SelfLoader.t fails.
But\, IMHO fixing the fork() bug is more important than being able to reuse DATA from a selfloaded module. Anyway a perl-interpreter level fix would be better. But how to do this ? Catch calls to fork() and dup *DATA when they happen ? This won't fix the situation where a perl is embedded in another process (say\, an httpd server) which forks.
--- ./lib/SelfLoader.pm-pre Wed Aug 13 23:37:40 2003 +++ ./lib/SelfLoader.pm Sat Oct 1 15:45:44 2005 @@ -51\,13 +51\,15 @@ sub load_stubs { shift->_load_stubs((cal sub _load_stubs { # $endlines is used by Devel::SelfStubber to capture lines after __END__ my($self\, $callpack\, $endlines) = @_; - my $fh = \*{"${callpack}::DATA"}; + my $ofh = \*{"${callpack}::DATA"}; my $currpack = $callpack; my($line\,$name\,@lines\, @stubs\, $protoype);
print STDERR "SelfLoader​::load\_stubs\($callpack\)\\n" if $DEBUG; croak\("$callpack doesn't contain an \_\_DATA\_\_ token"\)
- unless fileno($fh); + unless fileno($ofh); + open my $fh\, '\<&'\, $ofh or croak "reopen: $!"; + close $ofh; # Protect: fork() shares the pointer $Cache{"${currpack}::\<DATA"} = 1; # indicate package is cached
local\($/\) = "\\n";
On Tue\, Oct 04\, 2005 at 04:28:55PM +0200\, Rafael Garcia-Suarez wrote:
(The particular case of SelfLoader is simpler since data is read in one chunk\, thus fork() can't happen between two read()s.)
The problem with your patch is that the DATA filehandle can't be used anymore after loading stubs\, because you just closed it. And thus the 19th test of lib/SelfLoader.t fails.
But\, IMHO fixing the fork() bug is more important than being able to reuse DATA from a selfloaded module. Anyway a perl-interpreter level fix would be better. But how to do this ? Catch calls to fork() and dup *DATA when they happen ? This won't fix the situation where a perl is embedded in another process (say\, an httpd server) which forks.
Make a special input layer :forkable (useful not only for DATA\, but in most situations of read from "normal" file - I consider this part of semantic of fork() completely broken). It saves pid and pos() after each read. If on the next read pid changed\, you dup filedescriptor to itself (probably\, one needs 2 steps) and seek() to the preceeding position.
Can somebody see problems with this?
Thanks\, Ilya
Nosing around SelfLoader I wondered whether it was safe on a fork() and/or threads. The docs could helpfully say whether it is or not.
I suspect the answer is something like not safe in general\, or not until you load_stubs() -- but that you'd have to be fairly unlucky to get precisely concurrent load_stubs() reading the __DATA__ and hence making a mess.
(I saw the DATA handle dup()-ing code\, but I think it doesn't help\, since a dup() is the same file table entry so the file position is shared by parent and child\, either a fork or a thread.)
Migrated from rt.perl.org#37119 (status was 'open')
Searchable as RT37119$