Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.88k stars 531 forks source link

Trailing space in heredoc delimiter #15427

Open p5pRT opened 8 years ago

p5pRT commented 8 years ago

Migrated from rt.perl.org#128557 (status was 'open')

Searchable as RT128557$

p5pRT commented 8 years ago

From @epa

Created by @epa

The end marker for a here-document must be the exact string\, without trailing spaces. If the heredoc begins with \<\<END but instead of 'END' you have 'END '\, Perl treats that as part of the quoted string without warning.

I appreciate that this is done to be consistent with Unix tradition. The Unix shell likewise requires an exact line with the terminator string and no trailing space. However\, I suggest that in this day and age it may be worth a small break with tradition in order to catch the mistake of a trailing space after the terminator.

Firstly\, many commonly used text editors do not show any visual difference between a line with trailing space and one without\, so two programs which look identical end up with different behaviour.

Secondly\, we can no longer assume that Perl programmers are familiar with the dark arts of Unix shell scripting\, so they will not know about this 'gotcha'.

Thirdly\, Perl has already broken with the behaviour of the shell in the case of a carriage-return character. A trailing carriage return after the heredoc terminator is silently ignored\, even on platforms where text files use just a newline character for end of line. The Unix shell does not do this (or at least GNU bash doesn't). Given that we have already broken with the classical shell heredoc behaviour for one kind of whitespace character\, why not for others?

In my personal opinion the right fix would be to add a deprecation warning when the heredoc terminator is seen followed by whitespace and end-of-line. 'In future Perl versions this will be treated as the end of the heredoc.' Then that change can indeed be made after a release cycle or two.

An alternative would be to simply add a warning when the heredoc terminator appears but has whitespace after it. Although I appreciate the comments that heredocs are intended to be unrestricted with 'anything' appearing in them\, I feel the case for a warning is even stronger here given the invisibilty of the trailing space characters.

Perl Info ``` Flags: category=core severity=low Site configuration information for perl 5.22.2: Configured by Red Hat, Inc. at Mon Jun 13 12:25:51 UTC 2016. Summary of my perl5 (revision 5 version 22 subversion 2) configuration: Platform: osname=linux, osvers=4.4.9-300.fc23.x86_64, archname=x86_64-linux-thread-multi uname='linux buildvm-14.phx2.fedoraproject.org 4.4.9-300.fc23.x86_64 #1 smp wed may 4 23:56:27 utc 2016 x86_64 x86_64 x86_64 gnulinux ' config_args='-des -Doptimize=none -Dccflags=-O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -Dldflags=-Wl,-z,relro -Dccdlflags=-Wl,--enable-new-dtags -Wl,-z,relro -Dlddlflags=-shared -Wl,-z,relro -Dshrpdir=/usr/lib64 -DDEBUGGING=-g -Dversion=5.22.2 -Dmyhostname=localhost -Dperladmin=root@localhost -Dcc=gcc -Dcf_by=Red Hat, Inc. -Dprefix=/usr -Dvendorprefix=/usr -Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl5 -Dsitearch=/usr/local/lib64/perl5 -Dprivlib=/usr/share/perl5 -Dvendorlib=/usr/share/perl5/vendor_perl -Darchlib=/usr/lib64/perl5 -Dvendorarch=/usr/lib64/perl5/vendor_perl -Darchname=x86_64-linux-thread-multi -Dlibpth=/usr/local/lib64 /lib64 /usr/lib64 -Duseshrplib -Dusethreads -Duseithreads -Dusedtrace=/usr/bin/dtrace -Duselargefiles -Dd_semctl_semun -Di_db -Ui_ndbm -Di_gdbm -Di_shadow -Di_syslog -Dman3ext=3pm -Duseperlio -Dinstallusrbinperl=n -Ubincompat5005 -Uversiononly -Dpager=/usr/bin/less -isr -Dd_gethostent_r_proto -Ud_endhostent_r_proto -Ud_sethostent_r_proto -Ud_endprotoent_r_proto -Ud_setprotoent_r_proto -Ud_endservent_r_proto -Ud_setservent_r_proto -Dscriptdir=/usr/bin -Dusesitecustomize' hint=recommended, useposix=true, d_sigaction=define useithreads=define, usemultiplicity=define use64bitint=define, use64bitall=define, uselongdouble=undef usemymalloc=n, bincompat5005=undef Compiler: cc='gcc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -fwrapv -fno-strict-aliasing -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64', optimize=' -g', cppflags='-D_REENTRANT -D_GNU_SOURCE -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -fwrapv -fno-strict-aliasing -I/usr/local/include' ccversion='', gccversion='5.3.1 20160406 (Red Hat 5.3.1-6)', gccosandvers='' intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678, doublekind=3 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16, longdblkind=3 ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8 alignbytes=8, prototype=define Linker and Libraries: ld='gcc', ldflags ='-Wl,-z,relro -fstack-protector-strong -L/usr/local/lib' libpth=/usr/local/lib64 /lib64 /usr/lib64 /usr/local/lib /usr/lib /lib/../lib64 /usr/lib/../lib64 /lib libs=-lpthread -lresolv -lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lc -lgdbm_compat perllibs=-lpthread -lresolv -lnsl -ldl -lm -lcrypt -lutil -lc libc=libc-2.22.so, so=so, useshrplib=true, libperl=libperl.so gnulibc_version='2.22' Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,--enable-new-dtags -Wl,-z,relro ' cccdlflags='-fPIC', lddlflags='-shared -Wl,-z,relro -L/usr/local/lib -fstack-protector-strong' Locally applied patches: Fedora Patch1: Removes date check, Fedora/RHEL specific Fedora Patch3: support for libdir64 Fedora Patch4: use libresolv instead of libbind Fedora Patch5: USE_MM_LD_RUN_PATH Fedora Patch6: Skip hostname tests, due to builders not being network capable Fedora Patch7: Dont run one io test due to random builder failures Fedora Patch15: Define SONAME for libperl.so Fedora Patch16: Install libperl.so to -Dshrpdir value Fedora Patch22: Document Math::BigInt::CalcEmu requires Math::BigInt (CPAN RT#85015) Fedora Patch26: Make *DBM_File desctructors thread-safe (RT#61912) Fedora Patch27: Make PadlistNAMES() lvalue again (CPAN RT#101063) Fedora Patch28: Make magic vtable writable as a work-around for Coro (CPAN RT#101063) Fedora Patch29: Fix duplicating PerlIO::encoding when spawning threads (RT#31923) Fedora Patch200: Link XS modules to libperl.so with EU::CBuilder on Linux Fedora Patch201: Link XS modules to libperl.so with EU::MM on Linux @INC for perl 5.22.2: /home/eda/lib64/perl5/ /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 . Environment for perl 5.22.2: HOME=/home/eda LANG=en_GB.UTF-8 LANGUAGE (unset) LC_COLLATE=C LC_CTYPE=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8 LC_MONETARY=en_GB.UTF-8 LC_NUMERIC=en_GB.UTF-8 LC_TIME=en_GB.UTF-8 LD_LIBRARY_PATH (unset) LOGDIR (unset) PATH=/home/eda/bin:/home/eda/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/usr/sbin:/home/eda/.local/bin:/home/eda/bin:/sbin:/usr/sbin PERL5LIB=/home/eda/lib64/perl5/ PERL_BADLANG (unset) SHELL=/bin/bash ```
p5pRT commented 6 years ago

From @jkeenan

On Wed\, 06 Jul 2016 13​:50​:48 GMT\, eda@​waniasset.com wrote​:

This is a bug report for perl from eda@​waniasset.com\, generated with the help of perlbug 1.40 running under perl 5.22.2.

----------------------------------------------------------------- [Please describe your issue here]

The end marker for a here-document must be the exact string\, without trailing spaces. If the heredoc begins with \<\<END but instead of 'END' you have 'END '\, Perl treats that as part of the quoted string without warning.

I am unsure as to the specific case you are discussing.

In the attachment\, the first heredoc is defined with '\<\<END;'. The second is defined with '\<\<END ;' In both cases there is no whitespace after the terminator 'END'. Both "work" in the sense that both compile and print -- even though the second definition looks weird.

Is that the case you are concerned about? Or something else?

Thank you very much. -- James E Keenan (jkeenan@​cpan.org)

p5pRT commented 6 years ago

From @jkeenan

128557-heredoc-terminator.pl

p5pRT commented 6 years ago

The RT System itself - Status changed from 'new' to 'open'

p5pRT commented 6 years ago

From @abigail

On Fri\, Sep 29\, 2017 at 07​:27​:58PM -0700\, James E Keenan via RT wrote​:

On Wed\, 06 Jul 2016 13​:50​:48 GMT\, eda@​waniasset.com wrote​:

This is a bug report for perl from eda@​waniasset.com\, generated with the help of perlbug 1.40 running under perl 5.22.2.

----------------------------------------------------------------- [Please describe your issue here]

The end marker for a here-document must be the exact string\, without trailing spaces. If the heredoc begins with \<\<END but instead of 'END' you have 'END '\, Perl treats that as part of the quoted string without warning.

I am unsure as to the specific case you are discussing.

In the attachment\, the first heredoc is defined with '\<\<END;'. The second is defined with '\<\<END ;' In both cases there is no whitespace after the terminator 'END'. Both "work" in the sense that both compile and print -- even though the second definition looks weird.

Is that the case you are concerned about? Or something else?

I think the perceived problem is with a trailing space after the terminator *after* the here doc. As in​:

  print \<\< END;   This is printed

  END   This as well\, because the previous line   has a trailing space.

  END

I'm not sure how big a problem this is in practise. In many cases\, a trailing space will lead to a program which doesn't compile. For the remaining cases\, where both the program compiles and doesn't behave badly\, a linter (Perl​::Critic?) ought to do the trick.

Abigail

p5pRT commented 6 years ago

From @jkeenan

On Sat\, 30 Sep 2017 16​:18​:58 GMT\, abigail@​abigail.be wrote​:

On Fri\, Sep 29\, 2017 at 07​:27​:58PM -0700\, James E Keenan via RT wrote​:

On Wed\, 06 Jul 2016 13​:50​:48 GMT\, eda@​waniasset.com wrote​:

This is a bug report for perl from eda@​waniasset.com\, generated with the help of perlbug 1.40 running under perl 5.22.2.

----------------------------------------------------------------- [Please describe your issue here]

The end marker for a here-document must be the exact string\, without trailing spaces. If the heredoc begins with \<\<END but instead of 'END' you have 'END '\, Perl treats that as part of the quoted string without warning.

I am unsure as to the specific case you are discussing.

In the attachment\, the first heredoc is defined with '\<\<END;'. The second is defined with '\<\<END ;' In both cases there is no whitespace after the terminator 'END'. Both "work" in the sense that both compile and print -- even though the second definition looks weird.

Is that the case you are concerned about? Or something else?

I think the perceived problem is with a trailing space after the terminator *after* the here doc. As in​:

print \<\< END; This is printed

END This as well\, because the previous line has a trailing space.

END

I'm not sure how big a problem this is in practise. In many cases\, a trailing space will lead to a program which doesn't compile. For the remaining cases\, where both the program compiles and doesn't behave badly\, a linter (Perl​::Critic?) ought to do the trick.

I agree. I think the feature request should be rejected.

Thank you very much. -- James E Keenan (jkeenan@​cpan.org)

p5pRT commented 6 years ago

From @xsawyerx

On 09/30/2017 07​:27 PM\, James E Keenan via RT wrote​:

On Sat\, 30 Sep 2017 16​:18​:58 GMT\, abigail@​abigail.be wrote​:

On Fri\, Sep 29\, 2017 at 07​:27​:58PM -0700\, James E Keenan via RT wrote​:

On Wed\, 06 Jul 2016 13​:50​:48 GMT\, eda@​waniasset.com wrote​:

This is a bug report for perl from eda@​waniasset.com\, generated with the help of perlbug 1.40 running under perl 5.22.2.

----------------------------------------------------------------- [Please describe your issue here]

The end marker for a here-document must be the exact string\, without trailing spaces. If the heredoc begins with \<\<END but instead of 'END' you have 'END '\, Perl treats that as part of the quoted string without warning.

I am unsure as to the specific case you are discussing.

In the attachment\, the first heredoc is defined with '\<\<END;'. The second is defined with '\<\<END ;' In both cases there is no whitespace after the terminator 'END'. Both "work" in the sense that both compile and print -- even though the second definition looks weird.

Is that the case you are concerned about? Or something else?

I think the perceived problem is with a trailing space after the terminator *after* the here doc. As in​:

print \<\< END; This is printed

END This as well\, because the previous line has a trailing space.

END

I'm not sure how big a problem this is in practise. In many cases\, a trailing space will lead to a program which doesn't compile. For the remaining cases\, where both the program compiles and doesn't behave badly\, a linter (Perl​::Critic?) ought to do the trick.

I agree. I think the feature request should be rejected.

Agreed.

p5pRT commented 6 years ago

From @epa

Thanks for your responses. This bug report is based on real-world code; a lot of code (repeated SQL calls\, or printing chunks of output) do have repeated heredoc uses\, and if those all use the same delimiter (such as END) it is quite easy to silently include a trailing space -- not shown in most editors -- and end up with code that compiles but is wrong.

I filed the bug report because it's quite awkward that a space character (moreover\, an end-of-line one\, which are the most difficult to see) has such a large effect. But forget the suggestion of adding a warning. That gets bogged down in the usual back and forth about perlcritic (which I love\, and run over the codebase regularly -- but not in every single edit-test cycle). Could I ask instead whether the whitespace-sensitivity here is really needed?

What would happen if heredoc delimiters became like every other language token and weren't affected by space characters after them? They will always be special because they do depend on start of line\, but couldn't they be more relaxed about whitespace at the end? Nothing else in the language is as fussy\, and we know that with the mixture of different editors in use\, trailing spaces tend to creep into code over time. Wouldn't it be a sensible incremental improvement to allow whitespace after these heredoc terminators?

Note that the current behaviour is not exactly consistent​: if you put the start of the heredoc as '\<\<END ' with a trailing space\, it doesn't get terminated by the 'END ' you specified. The trailing space on the starting heredoc delimiter is silently stripped. Why not strip on the closing delimiter too?

p5pRT commented 6 years ago

From @abigail

On Sun\, Oct 01\, 2017 at 12​:10​:57PM -0700\, Ed Avis via RT wrote​:

Thanks for your responses. This bug report is based on real-world code; a lot of code (repeated SQL calls\, or printing chunks of output) do have repeated heredoc uses\, and if those all use the same delimiter (such as END) it is quite easy to silently include a trailing space -- not shown in most editors -- and end up with code that compiles but is wrong.

I filed the bug report because it's quite awkward that a space character (moreover\, an end-of-line one\, which are the most difficult to see) has such a large effect. But forget the suggestion of adding a warning. That gets bogged down in the usual back and forth about perlcritic (which I love\, and run over the codebase regularly -- but not in every single edit-test cycle). Could I ask instead whether the whitespace-sensitivity here is really needed?

What would happen if heredoc delimiters became like every other language token and weren't affected by space characters after them? They will always be special because they do depend on start of line\, but couldn't they be more relaxed about whitespace at the end? Nothing else in the language is as fussy\, and we know that with the mixture of different editors in use\, trailing spaces tend to creep into code over time. Wouldn't it be a sensible incremental improvement to allow whitespace after these heredoc terminators?

Note that the current behaviour is not exactly consistent​: if you put the start of the heredoc as '\<\<END ' with a trailing space\, it doesn't get terminated by the 'END ' you specified. The trailing space on the starting heredoc delimiter is silently stripped. Why not strip on the closing delimiter too?

This does what you expect it to do​:

  print \<\< "END ";   The line below does not end with a space​:   END   The line below does end with a space​:   END

Abigail