Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.88k stars 530 forks source link

can't dup DATA? #369

Closed p5pRT closed 20 years ago

p5pRT commented 24 years ago

Migrated from rt.perl.org#1204 (status was 'resolved')

Searchable as RT1204$

p5pRT commented 24 years ago

From @vlbrown

If I do this

  #!/usr/bin/perl

  open (INPUT\,"\<&STDIN");   while (\) {   print;   }

it works. I can type input and INPUT is a dup of STDIN. the while loop is the moral equivalent of   while (\) {

but if I try   #!/usr/bin/perl

  open (INPUT\,"\<&DATA");   while (\) {   print;   }

  __END__   apple   banana

when I run this\, nothing is printed. Can I not dup DATA? Is this because it is a "magic" filehandle? (Not a "real" filehandle).

Can Perl be changed to allow dup'ing of the DATA filehandle?

Perl Info ``` Site configuration information for perl 5.00502: Configured by vlb at Tue Dec 1 13:04:02 PST 1998. Summary of my perl5 (5.0 patchlevel 5 subversion 2) configuration: Platform: osname=solaris, osvers=2.6, archname=sun4-solaris uname='sunos jeeves 5.6 generic_105181-03 sun4u sparc sunw,ultra-enterprise ' hint=recommended, useposix=true, d_sigaction=define usethreads=undef useperlio=undef d_sfio=undef Compiler: cc='gcc', optimize='-O', gccversion=2.8.1 cppflags='-I/usr/local/include' ccflags ='-I/usr/local/include' stdchar='unsigned char', d_stdstdio=define, usevfork=false intsize=4, longsize=4, ptrsize=4, doublesize=8 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16 alignbytes=8, usemymalloc=y, prototype=define Linker and Libraries: ld='gcc', ldflags =' -L/usr/local/lib' libpth=/usr/local/lib /lib /usr/lib /usr/ccs/lib libs=-lsocket -lnsl -ldb -ldl -lm -lc -lcrypt libc=/lib/libc.so, so=so, useshrplib=false, libperl=libperl.a Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags=' ' cccdlflags='-fPIC', lddlflags='-G -L/usr/local/lib' Locally applied patches: @INC for perl 5.00502: /usr/local/lib/perl5/5.00502/sun4-solaris /usr/local/lib/perl5/5.00502 /usr/local/lib/perl5/site_perl/5.005/sun4-solaris /usr/local/lib/perl5/site_perl/5.005 . Environment for perl 5.00502: HOME=/export/home/vlb LANG (unset) LD_LIBRARY_PATH=/usr/usr2/oracle/product/8.0.5/lib LOGDIR (unset) PATH=/usr/local/sbin:/export/home/vlb/bin:/export/home/vlb/nib:/usr/local/bi n:/opt/SUNWspro/bin:/usr/ccs/bin:/usr/ucb:/usr/bin:/bin:/etc:/sbin:/usr/sbin :/us r/openwin/bin:/usr/usr2/oracle/product/8.0.5/bin:/usr/dt/bin:/usr/local/geno me/b in:/usr/games:. PERL_BADLANG (unset) SHELL=/usr/bin/tcsh ----- //=\ Vicki Brown \=// Journeyman Sourcerer: Scripts & Philtres //=\ \=// Scientific Programming <> Perl, Unix, Mac //=\ A little Web gardening on the weekends \=// //=\ Deltagen, Inc; 1031 Bing St, San Carlos, CA 94070 ```
p5pRT commented 24 years ago

From [Unknown Contact. See original ticket]

"VB" == Vicki Brown \vlb@&#8203;deltagen\.com writes​:

VB> Can Perl be changed to allow dup'ing of the DATA filehandle?

Would you realy want to allow that? Since the start of DATA is not the start of the data. Yes\, that could be true for other filehandles\, but

Interesting open(INPUT\, "\<&3") doesn't work either. (The 3 is from fileno(DATA)). Something special about the fd?

\ -- Chaim Frenkel Nonlinear Knowledge\, Inc. chaimf@​pobox.com +1-718-236-0183

p5pRT commented 24 years ago

From @mjdominus

Indeed\, if the data after the __END__ tag is long enough\, then Vicki's example does print out most of it\, omitting only a bit at the beginning that was read into the DATA stdio buffer at program start time.

p5pRT commented 24 years ago

From @vlbrown

At 15​:07 -0400 8/11/99\, Chaim Frenkel wrote​:

"VB" == Vicki Brown \vlb@&#8203;deltagen\.com writes​:

VB> Can Perl be changed to allow dup'ing of the DATA filehandle?

Would you realy want to allow that?

Yes. :-)

Rationale (from the MacPerl mailing list)​:

I enter test data after the __END__ tag at script end and use it to test. I was wondering if there was any way to map one file handle onto another for testing. For example\, I have a script with 'while (\)' where INPUT is the result of 'open INPUT\, $inputfile;' I then have a couple of spots where the script returns to the file top\, etc. It would be handy to change one line mapping INPUT to DATA during testing\, rather than switching every instance of INPUT to DATA.

I do this too. It's often easier to fake the data after the __END__ section and get the script working on that\, then move on to actually reading from STDIN\, or opening a file or a pipe or whathaveyou.

I was a tad surprised that DATA didn't seem to be duplicatable [sic] this way\, and didn't see any special caveats in the docs (OK\, point me to the part I missed :)... is this related to the inability to refer to the DATA filehandle in a BEGIN{} block?

Since the start of DATA is not the start of the data. Yes\, that could be true for other filehandles\, but

Well\, I figure that if someone told Perl to allow it\, they'd tell Perl how to do it "correctly" :-)

Interesting open(INPUT\, "\<&3") doesn't work either. (The 3 is from fileno(DATA)). Something special about the fd?

As usual\, I could live with a change to the docs explaining why this cannot be done :) -- --   |\ _\,\,\,---\,\,_ Vicki Brown \vlb@&#8203;cfcl\.com ZZZzz /\,`.-'`' -. ;-;;\,_ Journeyman Sourceror​: Scripts & Philtres   |\,4- ) )-\,_. \,\ ( `'-' P.O. Box 1269 San Bruno CA 94066   '---''(_/--' `-'\_) http​://www.cfcl.com/~vlb http​://www.macperl.com

p5pRT commented 24 years ago

From [Unknown Contact. See original ticket]

Graham Barr \gbarr@&#8203;pobox\.com writes​:

This is a bug that needs to be fixed because

open(INPUT\,"\<&" . fileno(DATA)) or die "$!"; print \;

__END__ 1 2 3

will print nothing and the following does exactly what you want.

seek(DATA\,0\,1); open(INPUT\,"\<&" . fileno(DATA)) or die "$!"; print \;

This is perfectly normal "dup'ing a stdio buffered handle" issue.

open(FOO\,__FILE__); my $first = \; while (\) {   last if /^__(DATA|END)__$/ } open(INPUT\,\<&FOO);

will have same problem.

The DATA handle has been read so stdio has slurped (say) 8K of data into its buffer - which is enough to consume moderate sized scripts + data. Thus underlying fd is at EOF.

If you fix it for DATA you should/will fix it for all handles ... It is just a case of dup implying a PerlIO_seek(f\,0\,1) (with consequent $! pollution on ttys etc.)

-- Nick Ing-Simmons

p5pRT commented 24 years ago

From @gbarr

This is a bug that needs to be fixed because

open(INPUT\,"\<&" . fileno(DATA)) or die "$!"; print \;

__END__ 1 2 3

will print nothing and the following does exactly what you want.

seek(DATA\,0\,1); open(INPUT\,"\<&" . fileno(DATA)) or die "$!"; print \;

__END__ 1 2 3

This is probably a bug that needs fixing.

On Wed\, Aug 11\, 1999 at 01​:07​:29PM -0700\, Vicki Brown wrote​:

Interesting open(INPUT\, "\<&3") doesn't work either. (The 3 is from fileno(DATA)). Something special about the fd?

As usual\, I could live with a change to the docs explaining why this cannot be done :)

-- Since you're clearly mad as a mongoose\, I'll bid you good-day.   -- Edmund to Captain Rum : Black Adder II "Potato"

p5pRT commented 24 years ago

From @mjdominus

Barr says​:

This is a bug that needs to be fixed

Patch enclosed. Someone with more experience should look at it and make sure I didn't commit any terrible errors. Perl 5.5.57 does pass all the tests\, and it does fix Vicki's problem\, as well as other related problems such as​:

  #!/usr/bin/perl   open F1\, '/tmp/vb2' or die;   print scalar \; # Prints first line from file   open F2\, "\<&F1" or die;   print scalar \; # Fails to print second line from file

Idea of patch​: Call `seek' automatically to flush the buffer just befure dulicating the file descriptor.

--- doio.c 1999/06/10 23​:11​:05 1.1 +++ doio.c 1999/08/11 19​:38​:46 @​@​ -243\,7 +243\,10 @​@​   goto say_false;   }   if (IoIFP(thatio)) { - fd = PerlIO_fileno(IoIFP(thatio)); + PerlIO *fp = IoIFP(thatio); + /* Flush stdio buffer before dup */ + PerlIO_seek(fp\, 0\, 1); + fd = PerlIO_fileno(fp);   if (IoTYPE(thatio) == 's')   IoTYPE(io) = 's';   } --- t/io/dup.t 1999/08/11 19​:43​:50 1.1 +++ t/io/dup.t 1999/08/11 19​:52​:47 @​@​ -2\,7 +2\,7 @​@​

# $RCSfile​: dup.t\,v $$Revision​: 1.1 $$Date​: 1999/08/11 19​:43​:50 $

-print "1..6\n"; +print "1..7\n";

print "ok 1\n";

@​@​ -37\,3 +37\,16 @​@​ unlink 'Io.dup';

print STDOUT "ok 6\n"; + +# 7 # 19990811 mjd@​plover.com +my ($out1\, $out2) = ("Line 1\n"\, "Line 2\n"); +open(W\, "> Io.dup") || die "Can't open stdout"; +print W $out1\, $out2; +close W; +open(R1\, "\< Io.dup") || die "Can't read temp file"; +$in1 = \; +open(R2\, "\<&R1") || die "Can't dup"; +$in2 = \; +print "not " unless $in1 eq $out1 && $in2 eq $out2; +print "ok 7\n"; +

p5pRT commented 24 years ago

From @vlbrown

At 15​:31 -0400 8/11/99\, Mark-Jason Dominus wrote​:

Indeed\, if the data after the __END__ tag is long enough\, then Vicki's example does print out most of it\, omitting only a bit at the beginning that was read into the DATA stdio buffer at program start time.

Gaah! You're right.

I see "long enough" as being 16297 characters in MacPerl (5.004)\, 1147 under Solaris (5.005_02) and 639 chars on my Redhat / MkLinux PPC system (5.005_02).

I'm leaning toward bug :) -- --   |\ _\,\,\,---\,\,_ Vicki Brown \vlb@&#8203;cfcl\.com ZZZzz /\,`.-'`' -. ;-;;\,_ Journeyman Sourceror​: Scripts & Philtres   |\,4- ) )-\,_. \,\ ( `'-' P.O. Box 1269 San Bruno CA 94066   '---''(_/--' `-'\_) http​://www.cfcl.com/~vlb http​://www.macperl.com

p5pRT commented 24 years ago

From @mjdominus

Patch enclosed.

I forgot to do perldelta.

--- pod/perldelta.pod 1999/08/11 19​:58​:16 1.3 +++ pod/perldelta.pod 1999/08/11 20​:02​:52 @​@​ -229\,6 +229\,14 @​@​ buffering mishaps suffered by users unaware of how Perl internally handles I/O.

+=head2 Buffered data discarded from input filehandle when dup'ed. + +C\<open(NEW\, "E\&OLD")> now discards any data that was previously +read and buffered in C\. The next read operation on C\ will +return the same data as the corresponding operation on C\. +Formerly\, it would have returned the data from the start of the +following disk block instead. + =head1 Supported Platforms

=over 4

p5pRT commented 24 years ago

From [Unknown Contact. See original ticket]

Rationale (from the MacPerl mailing list)​:

I enter test data after the __END__ tag at script end and use it to test. I was wondering if there was any way to map one file handle onto another for testing. For example\, I have a script with 'while (\)' where INPUT is the result of 'open INPUT\, $inputfile;' I then have a couple of spots where the script returns to the file top\, etc. It would be handy to change one line mapping INPUT to DATA during testing\, rather than switching every instance of INPUT to DATA.

I do this too. It's often easier to fake the data after the __END__ section and get the script working on that\, then move on to actually reading from STDIN\, or opening a file or a pipe or whathaveyou.

It's always been annoying that this doesn't work correctly​:

  % perl whateverscript "\<&DATA"

on \ handling.

--tom

p5pRT commented 24 years ago

From @mjdominus

It's always been annoying that this doesn't work correctly​:

% perl whateverscript "\<&DATA"

on \ handling.

My patch fixes that.

p5pRT commented 24 years ago

From @jhi

The patch doesn't seem to work in Digital UNIX\, the new io/dup subtest fails. On the other hand\, the patch seems to do no harm (no other failures).

This is what

print "in1 = '$in1'\, out1 = '$out1'\, in2 = '$in2'\, out2 = '$out2'\n";

outputs after the subtest #7.

in1 = 'Line 1 '\, out1 = 'Line 1 '\, in2 = ''\, out2 = 'Line 2 '

My guess is that calling PerlIO_seek(fp\, 0\, SEEK_CUR) doesn't flush. Silly question of the day​: why not

  PerlIO_flush(fp);

instead of the seek()?

-- $jhi++; # http​://www.iki.fi/jhi/   # There is this special biologist word we use for 'stable'.   # It is 'dead'. -- Jack Cohen

p5pRT commented 24 years ago

From @mjdominus

My guess is that calling PerlIO_seek(fp\, 0\, SEEK_CUR) doesn't flush.

Yeah.

Silly question of the day​: why not PerlIO_flush(fp); instead of the seek()?

No good reason; I think I had seek on the brain because of Graham's message. Can you try flush() and see if it works on your side and I will try it here too and if it works in both places I will amend and resubmit the patch.

Thanks.

p5pRT commented 24 years ago

From @jhi

Mark-Jason Dominus writes​:

My guess is that calling PerlIO_seek(fp\, 0\, SEEK_CUR) doesn't flush.

Yeah.

Silly question of the day​: why not PerlIO_flush(fp); instead of the seek()?

No good reason; I think I had seek on the brain because of Graham's message. Can you try flush() and see if it works on your side and I

I tried it already. It works.

will try it here too and if it works in both places I will amend and resubmit the patch.

No need to resubmit the patch; just confirm whether it works for you and I'll check in my change.

Thanks.

-- $jhi++; # http​://www.iki.fi/jhi/   # There is this special biologist word we use for 'stable'.   # It is 'dead'. -- Jack Cohen

p5pRT commented 24 years ago

From @jhi

Nick Ing-Simmons writes​:

Jarkko Hietaniemi \jhi@&#8203;iki\.fi writes​:

Silly question of the day​: why not

PerlIO_flush(fp);

instead of the seek()?

Because where PerlIO is stdio fflush() may not do anything useful on handles open for read.

But neither does seek()\, it seems. Shall we do do both? And if that does not help\, sacrifice a chicken and perform the Shamanistic Ritual #17b?

-- $jhi++; # http​://www.iki.fi/jhi/   # There is this special biologist word we use for 'stable'.   # It is 'dead'. -- Jack Cohen

p5pRT commented 24 years ago

From [Unknown Contact. See original ticket]

Jarkko Hietaniemi \jhi@&#8203;iki\.fi writes​:

Silly question of the day​: why not

PerlIO_flush(fp);

instead of the seek()?

Because where PerlIO is stdio fflush() may not do anything useful on handles open for read.

-- Nick Ing-Simmons \nik@&#8203;tiuk\.ti\.com Via\, but not speaking for​: Texas Instruments Ltd.

p5pRT commented 24 years ago

From @jhi

Nick Ing-Simmons writes​:

My only concern is that some stdio somewhere will complain about flush() on a read handle.

And if that does not help\, sacrifice a chicken and perform the Shamanistic Ritual #17b?

Perhaps the correct fix is : PerlIO_flush(src); PerlIO_seek(dst\,PerlIO_tell(src)\,0);

Although that perhaps should be getpos/setpos to handle REC files and/or large files.

Okay\, *now* I want an updated patch...

-- $jhi++; # http​://www.iki.fi/jhi/   # There is this special biologist word we use for 'stable'.   # It is 'dead'. -- Jack Cohen

p5pRT commented 24 years ago

From [Unknown Contact. See original ticket]

Jarkko Hietaniemi \jhi@&#8203;iki\.fi writes​:

Nick Ing-Simmons writes​:

Jarkko Hietaniemi \jhi@&#8203;iki\.fi writes​:

Silly question of the day​: why not

PerlIO_flush(fp);

instead of the seek()?

Because where PerlIO is stdio fflush() may not do anything useful on handles open for read.

But neither does seek()\, it seems.

An understandable optimization - if stdio is not bothered about dups.

Shall we do do both?

My only concern is that some stdio somewhere will complain about flush() on a read handle.

And if that does not help\, sacrifice a chicken and perform the Shamanistic Ritual #17b?

Perhaps the correct fix is :   PerlIO_flush(src);   PerlIO_seek(dst\,PerlIO_tell(src)\,0);

Although that perhaps should be getpos/setpos to handle REC files and/or large files.

-- Nick Ing-Simmons \nik@&#8203;tiuk\.ti\.com Via\, but not speaking for​: Texas Instruments Ltd.

p5pRT commented 24 years ago

From @jhi

It seems that PerlIO_seek() doesn't flush in IRIX 6.5\, either.

-- $jhi++; # http​://www.iki.fi/jhi/   # There is this special biologist word we use for 'stable'.   # It is 'dead'. -- Jack Cohen