Perl / perl5

đŸȘ The Perl programming language
https://dev.perl.org/perl5/
Other
1.98k stars 559 forks source link

"\c\" is a syntax error #13154

Open p5pRT opened 11 years ago

p5pRT commented 11 years ago

Migrated from rt.perl.org#119191 (status was 'open')

Searchable as RT119191$

p5pRT commented 11 years ago

From @mauke

Created by @mauke

% perl -we 'printf "%vd\n"\, "\c\"' Can't find string terminator '"' anywhere before EOF at -e line 1. % perl -we 'printf "%vd\n"\, "\c\\"' 28.92 % perl -we 'printf "%vd\n"\, "\x1c\\"' 28.92

There's no way to get a CTRL-\ (0x1C) with perl's \c notation.

The obvious syntax "\c\" is a syntax error.

The next try "\c\\" parses but ends up being a 2-character string consisting of CTRL-\ and \ (backslash).

I think the latter case is definitely a bug because the middle backslash ends up doing double duty​: it is used by \c to form \c\ (0x1C)\, but then also escapes the third backslash to form \\ (0x5C). This half-reparsing shouldn't happen.

And for consistency with other control characters it would be nice if you could get a ^\ by writing "\c\".

(This bug may be a duplicate but when I search RT for "\c\" I get 693 pages of no results.)

Perl Info ``` Flags: category=core severity=low This perlbug was built using Perl 5.12.1 - Thu Jun 3 20:09:15 CEST 2010 It is being executed now by Perl 5.18.0 - Fri May 24 23:20:48 CEST 2013. Site configuration information for perl 5.18.0: Configured by mauke at Fri May 24 23:20:48 CEST 2013. Summary of my perl5 (revision 5 version 18 subversion 0) configuration: Platform: osname=linux, osvers=3.5.7-gentoo, archname=i686-linux uname='linux nora 3.5.7-gentoo #5 preempt sat jan 26 16:46:10 cet 2013 i686 amd athlon(tm) 64 processor 3200+ authenticamd gnulinux ' config_args='' hint=recommended, useposix=true, d_sigaction=define useithreads=undef, usemultiplicity=undef useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef use64bitint=undef, use64bitall=undef, uselongdouble=undef usemymalloc=n, bincompat5005=undef Compiler: cc='cc', ccflags ='-fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64', optimize='-O2 -flto', cppflags='-fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include' ccversion='', gccversion='4.8.0', gccosandvers='' intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12 ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8 alignbytes=4, prototype=define Linker and Libraries: ld='cc', ldflags ='-fstack-protector -L/usr/local/lib -O2 -flto' libpth=/usr/local/lib /lib/../lib /usr/lib/../lib /lib /usr/lib libs=-lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lc -lgdbm_compat perllibs=-lnsl -ldl -lm -lcrypt -lutil -lc libc=/lib/libc-2.15.so, so=so, useshrplib=false, libperl=libperl.a gnulibc_version='2.15' Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E' cccdlflags='-fPIC', lddlflags='-shared -O2 -flto -L/usr/local/lib -fstack-protector' Locally applied patches: SAVEARGV0 - disable magic open in @INC for perl 5.18.0: /home/mauke/usr/local/lib/perl5/site_perl/5.18.0/i686-linux /home/mauke/usr/local/lib/perl5/site_perl/5.18.0 /home/mauke/usr/local/lib/perl5/5.18.0/i686-linux /home/mauke/usr/local/lib/perl5/5.18.0 . Environment for perl 5.18.0: HOME=/home/mauke LANG=en_US.UTF-8 LANGUAGE (unset) LC_COLLATE=POSIX LD_LIBRARY_PATH=/home/mauke/usr/local/lib LOGDIR (unset) PATH=/home/mauke/usr/perlbrew/bin:/home/mauke/usr/local/bin:/usr/local/bin:/usr/bin:/bin:/opt/bin:/usr/i686-pc-linux-gnu/gcc-bin/4.6.3:/opt/sun-jdk-1.4.2.13/bin:/opt/sun-jdk-1.4.2.13/jre/bin:/opt/sun-jdk-1.4.2.13/jre/javaws:/opt/dmd/bin:/usr/games/bin PERLBREW_BASHRC_VERSION=0.43 PERLBREW_HOME=/home/mauke/.perlbrew PERLBREW_PATH=/home/mauke/usr/perlbrew/bin PERLBREW_ROOT=/home/mauke/usr/perlbrew PERLBREW_VERSION=0.27 PERL_BADLANG (unset) PERL_UNICODE=SAL SHELL=/bin/bash ```
p5pRT commented 11 years ago

From @khwilliamson

On 08/07/2013 08​:14 AM\, l.mai@​web.de (via RT) wrote​:

# New Ticket Created by l.mai@​web.de # Please include the string​: [perl #119191] # in the subject line of all future correspondence about this issue. # \<URL​: https://rt-archive.perl.org/perl5/Ticket/Display.html?id=119191 >

This is a bug report for perl from l.mai@​web.de\, generated with the help of perlbug 1.39 running under perl 5.18.0.

----------------------------------------------------------------- [Please describe your issue here]

% perl -we 'printf "%vd\n"\, "\c\"' Can't find string terminator '"' anywhere before EOF at -e line 1. % perl -we 'printf "%vd\n"\, "\c\\"' 28.92 % perl -we 'printf "%vd\n"\, "\x1c\\"' 28.92

There's no way to get a CTRL-\ (0x1C) with perl's \c notation.

The obvious syntax "\c\" is a syntax error.

The next try "\c\\" parses but ends up being a 2-character string consisting of CTRL-\ and \ (backslash).

I think the latter case is definitely a bug because the middle backslash ends up doing double duty​: it is used by \c to form \c\ (0x1C)\, but then also escapes the third backslash to form \\ (0x5C). This half-reparsing shouldn't happen.

And for consistency with other control characters it would be nice if you could get a ^\ by writing "\c\".

(This bug may be a duplicate but when I search RT for "\c\" I get 693 pages of no results.)

From perlop​:

"Also\, "\c\X" yields " chr(28) . "X"" for any X\, but cannot come at the end of a string\, because the backslash would be parsed as escaping the end quote.

...

"Also no attention is paid to "\c\" (multichar control char syntax) during this search. Thus the second "\" in "qq/\c\/" is interpreted as a part of "\/"\, and the following "/" is not recognized as a delimiter. Instead\, use "\034" or "\x1c" at the end of quoted constructs."

The latter quote from the section about the "Gory details of parsing"

p5pRT commented 11 years ago

The RT System itself - Status changed from 'new' to 'open'

p5pRT commented 11 years ago

From @Hugmeir

On Wed\, Aug 7\, 2013 at 11​:52 AM\, Karl Williamson \public@&#8203;khwilliamson\.comwrote​:

On 08/07/2013 08​:14 AM\, l.mai@​web.de (via RT) wrote​:

# New Ticket Created by l.mai@​web.de # Please include the string​: [perl #119191] # in the subject line of all future correspondence about this issue. # \<URL​: https​://rt.perl.org​:443/rt3/**Ticket/Display.html?id=119191\https://rt-archive.perl.org/perl5/Ticket/Display.html?id=119191>

This is a bug report for perl from l.mai@​web.de\, generated with the help of perlbug 1.39 running under perl 5.18.0.

------------------------------**------------------------------**----- [Please describe your issue here]

% perl -we 'printf "%vd\n"\, "\c\"' Can't find string terminator '"' anywhere before EOF at -e line 1. % perl -we 'printf "%vd\n"\, "\c\\"' 28.92 % perl -we 'printf "%vd\n"\, "\x1c\\"' 28.92

There's no way to get a CTRL-\ (0x1C) with perl's \c notation.

The obvious syntax "\c\" is a syntax error.

The next try "\c\\" parses but ends up being a 2-character string consisting of CTRL-\ and \ (backslash).

I think the latter case is definitely a bug because the middle backslash ends up doing double duty​: it is used by \c to form \c\ (0x1C)\, but then also escapes the third backslash to form \\ (0x5C). This half-reparsing shouldn't happen.

And for consistency with other control characters it would be nice if you could get a ^\ by writing "\c\".

(This bug may be a duplicate but when I search RT for "\c\" I get 693 pages of no results.)

From perlop​:

"Also\, "\c\X" yields " chr(28) . "X"" for any X\, but cannot come at the end of a string\, because the backslash would be parsed as escaping the end quote.

...

"Also no attention is paid to "\c\" (multichar control char syntax) during this search. Thus the second "\" in "qq/\c\/" is interpreted as a part of "\/"\, and the following "/" is not recognized as a delimiter. Instead\, use "\034" or "\x1c" at the end of quoted constructs."

The latter quote from the section about the "Gory details of parsing"

Hm. That really sounds more like it's documenting a bug. That being said\, without looking at the code\, I imagine that fixing it would make the parsing of all strings ever so slightly slower to get around one edge case\, so perhaps it's better this way.

p5pRT commented 11 years ago

From @cpansprout

On Wed Aug 07 07​:14​:53 2013\, mauke- wrote​:

(This bug may be a duplicate but when I search RT for "\c\" I get 693 pages of no results.)

RT ignores backslashes\, which is annoying.

I know I have seen this before\, but I can’t find the ticket. It may simply have been discussed on p5p. It was a *long* time ago *Larry Wall was involved in the discussion.) It was never fully resolved\, though at the time I think everyone considered it a bug. Just nobody know how to fix it.

--

Father Chrysostomos

p5pRT commented 11 years ago

From @cpansprout

On Wed Aug 07 12​:41​:07 2013\, sprout wrote​:

On Wed Aug 07 07​:14​:53 2013\, mauke- wrote​:

(This bug may be a duplicate but when I search RT for "\c\" I get 693 pages of no results.)

RT ignores backslashes\, which is annoying.

I know I have seen this before\, but I can’t find the ticket. It may simply have been discussed on p5p. It was a *long* time ago *Larry Wall was involved in the discussion.) It was never fully resolved\, though at the time I think everyone considered it a bug. Just nobody know how to fix it.

If I recall correctly\, Larry’s suggestion was for the first pass to treat \c\ (where X is any char) as a single entity to skip over\, just as \\ is skipped over.

That way "\c\" is control-backslash\, "\c\\\" is control-backslash followed by a backslash and "\c\\" is ‘Can't find string terminator...’.

Considering that \c\ anywhere inside a quoted string is completely buggy\, I think it is ok to change this.

--

Father Chrysostomos

p5pRT commented 11 years ago

From @cpansprout

On Sat Aug 24 19​:19​:08 2013\, sprout wrote​:

On Wed Aug 07 12​:41​:07 2013\, sprout wrote​:

On Wed Aug 07 07​:14​:53 2013\, mauke- wrote​:

(This bug may be a duplicate but when I search RT for "\c\" I get 693 pages of no results.)

RT ignores backslashes\, which is annoying.

I know I have seen this before\, but I can’t find the ticket. It may simply have been discussed on p5p. It was a *long* time ago *Larry Wall was involved in the discussion.) It was never fully resolved\, though at the time I think everyone considered it a bug. Just nobody know how to fix it.

If I recall correctly\, Larry’s suggestion was for the first pass to treat \c\ (where X is any char) as a single entity to skip over\, just as \\ is skipped over.

That way "\c\" is control-backslash\, "\c\\\" is control-backslash followed by a backslash and "\c\\" is ‘Can't find string terminator...’.

Considering that \c\ anywhere inside a quoted string is completely buggy\, I think it is ok to change this.

You can see it on the sprout/cntrl branch\, and hereto attached.

Do we want this?

--

Father Chrysostomos

p5pRT commented 11 years ago

From @cpansprout

From dd6344855396df034857e5570c1e9ae1201e0c02 Mon Sep 17 00​:00​:00 2001 From​: Father Chrysostomos \sprout@&#8203;cpan\.org Date​: Sun\, 25 Aug 2013 01​:06​:03 -0700 Subject​: [PATCH] \c

\c now escapes the following character\, the way \ does\, so "\c\" is a single control-backslash\, "\c"" is control-"\, etc.

Inline Patch ```diff diff --git a/dist/B-Deparse/Deparse.pm b/dist/B-Deparse/Deparse.pm index 6a63cf8..e337477 100644 --- a/dist/B-Deparse/Deparse.pm +++ b/dist/B-Deparse/Deparse.pm @@ -3970,7 +3970,7 @@ my %unctrl = # portable to EBCDIC "\cY" => '\cY', "\cZ" => '\cZ', "\c[" => '\c[', # unused - "\c\\" => '\c\\', # unused + eval '"\c\\"' || eval '"\c\"' => '\c\\', # unused "\c]" => '\c]', # unused "\c_" => '\c_', # unused ); diff --git a/embed.fnc b/embed.fnc index 5cd5daa..02cfd8c 100644 --- a/embed.fnc +++ b/embed.fnc @@ -2225,7 +2225,8 @@ sR |char* |scan_inputsymbol|NN char *start sR |char* |scan_pat |NN char *start|I32 type sR |char* |scan_str |NN char *start|int keep_quoted \ |int keep_delims|int re_reparse \ - |bool deprecate_escaped_matching + |bool deprecate_escaped_matching \ + |int skip_cntrl sR |char* |scan_subst |NN char *start sR |char* |scan_trans |NN char *start s |char* |scan_word |NN char *s|NN char *dest|STRLEN destlen \ diff --git a/embed.h b/embed.h index 6cdcf82..846bb36 100644 --- a/embed.h +++ b/embed.h @@ -1628,7 +1628,7 @@ #define scan_ident(a,b,c,d,e) S_scan_ident(aTHX_ a,b,c,d,e) #define scan_inputsymbol(a) S_scan_inputsymbol(aTHX_ a) #define scan_pat(a,b) S_scan_pat(aTHX_ a,b) -#define scan_str(a,b,c,d,e) S_scan_str(aTHX_ a,b,c,d,e) +#define scan_str(a,b,c,d,e,f) S_scan_str(aTHX_ a,b,c,d,e,f) #define scan_subst(a) S_scan_subst(aTHX_ a) #define scan_trans(a) S_scan_trans(aTHX_ a) #define scan_word(a,b,c,d,e) S_scan_word(aTHX_ a,b,c,d,e) diff --git a/proto.h b/proto.h index 48723db..21f7a2d 100644 --- a/proto.h +++ b/proto.h @@ -7359,7 +7359,7 @@ STATIC char* S_scan_pat(pTHX_ char *start, I32 type) #define PERL_ARGS_ASSERT_SCAN_PAT \ assert(start) -STATIC char* S_scan_str(pTHX_ char *start, int keep_quoted, int keep_delims, int re_reparse, bool deprecate_escaped_matching) +STATIC char* S_scan_str(pTHX_ char *start, int keep_quoted, int keep_delims, int re_reparse, bool deprecate_escaped_matching, int skip_cntrl) __attribute__warn_unused_result__ __attribute__nonnull__(pTHX_1); #define PERL_ARGS_ASSERT_SCAN_STR \ diff --git a/t/comp/parser.t b/t/comp/parser.t index 28412da..763a736 100644 --- a/t/comp/parser.t +++ b/t/comp/parser.t @@ -108,7 +108,8 @@ like( $@, qr/error/, 'lexical block discarded by yacc' ); # bug #18573, used to corrupt memory eval q{ "\c" }; -like( $@, qr/^Missing control char name in \\c/, q("\c" string) ); +like( $@, qr/^Can't find string terminator '"' anywhere before EOF/, + q("\c" string) ); eval q{ qq(foo$) }; like( $@, qr/Final \$ should be \\\$ or \$name/, q($ at end of "" string) ); diff --git a/toke.c b/toke.c index cf09684..cd5e68a 100644 --- a/toke.c +++ b/toke.c @@ -3738,6 +3738,7 @@ S_scan_const(pTHX_ char *start) *d++ = grok_bslash_c(*s++, has_utf8, 1); } else { + assert(0); yyerror("Missing control char name in \\c"); } continue; @@ -5885,7 +5886,7 @@ Perl_yylex(pTHX) } sv = newSVpvn_flags(s, len, UTF ? SVf_UTF8 : 0); if (*d == '(') { - d = scan_str(d,TRUE,TRUE,FALSE, FALSE); + d = scan_str(d,TRUE,TRUE,FALSE,FALSE,0); if (!d) { /* MUST advance bufptr here to avoid bogus "at end of line" context messages from yyerror(). @@ -6788,7 +6789,7 @@ Perl_yylex(pTHX) TERM(THING); case '\'': - s = scan_str(s,!!PL_madskills,FALSE,FALSE, FALSE); + s = scan_str(s,!!PL_madskills,FALSE,FALSE,FALSE,0); DEBUG_T( { printbuf("### Saw string before %s\n", s); } ); if (PL_expect == XOPERATOR) { if (PL_lex_formbrack && PL_lex_brackets == PL_lex_formbrack) { @@ -6803,7 +6804,7 @@ Perl_yylex(pTHX) TERM(sublex_start()); case '"': - s = scan_str(s,!!PL_madskills,FALSE,FALSE, FALSE); + s = scan_str(s,!!PL_madskills,FALSE,FALSE,FALSE,1); DEBUG_T( { printbuf("### Saw string before %s\n", s); } ); if (PL_expect == XOPERATOR) { if (PL_lex_formbrack && PL_lex_brackets == PL_lex_formbrack) { @@ -6826,7 +6827,7 @@ Perl_yylex(pTHX) TERM(sublex_start()); case '`': - s = scan_str(s,!!PL_madskills,FALSE,FALSE, FALSE); + s = scan_str(s,!!PL_madskills,FALSE,FALSE,FALSE,1); DEBUG_T( { printbuf("### Saw backtick string before %s\n", s); } ); if (PL_expect == XOPERATOR) no_op("Backticks",s); @@ -8300,7 +8301,7 @@ Perl_yylex(pTHX) LOP(OP_PIPE_OP,XTERM); case KEY_q: - s = scan_str(s,!!PL_madskills,FALSE,FALSE, FALSE); + s = scan_str(s,!!PL_madskills,FALSE,FALSE,FALSE,0); if (!s) missingterm(NULL); pl_yylval.ival = OP_CONST; @@ -8311,7 +8312,7 @@ Perl_yylex(pTHX) case KEY_qw: { OP *words = NULL; - s = scan_str(s,!!PL_madskills,FALSE,FALSE, FALSE); + s = scan_str(s,!!PL_madskills,FALSE,FALSE,FALSE,0); if (!s) missingterm(NULL); PL_expect = XOPERATOR; @@ -8361,7 +8362,7 @@ Perl_yylex(pTHX) } case KEY_qq: - s = scan_str(s,!!PL_madskills,FALSE,FALSE, FALSE); + s = scan_str(s,!!PL_madskills,FALSE,FALSE,FALSE,1); if (!s) missingterm(NULL); pl_yylval.ival = OP_STRINGIFY; @@ -8374,7 +8375,7 @@ Perl_yylex(pTHX) TERM(sublex_start()); case KEY_qx: - s = scan_str(s,!!PL_madskills,FALSE,FALSE, FALSE); + s = scan_str(s,!!PL_madskills,FALSE,FALSE,FALSE,2); if (!s) missingterm(NULL); readpipe_override(); @@ -8691,7 +8692,7 @@ Perl_yylex(pTHX) /* Look for a prototype */ if (*s == '(') { - s = scan_str(s,!!PL_madskills,FALSE,FALSE, FALSE); + s = scan_str(s,!!PL_madskills,FALSE,FALSE,FALSE,0); if (!s) Perl_croak(aTHX_ "Prototype not terminated"); (void)validate_proto(PL_subname, PL_lex_stuff, ckWARN(WARN_ILLEGALPROTO)); @@ -9594,7 +9595,7 @@ S_scan_pat(pTHX_ char *start, I32 type) PERL_ARGS_ASSERT_SCAN_PAT; s = scan_str(start,!!PL_madskills,FALSE, (PL_in_eval & EVAL_RE_REPARSING), - TRUE /* look for escaped bracketed metas */ ); + TRUE /* look for escaped bracketed metas */, 1 ); if (!s) { const char * const delimiter = skipspace(start); @@ -9687,7 +9688,7 @@ S_scan_subst(pTHX_ char *start) pl_yylval.ival = OP_NULL; s = scan_str(start,!!PL_madskills,FALSE,FALSE, - TRUE /* look for escaped bracketed metas */ ); + TRUE /* look for escaped bracketed metas */, 1 ); if (!s) Perl_croak(aTHX_ "Substitution pattern not terminated"); @@ -9705,7 +9706,7 @@ S_scan_subst(pTHX_ char *start) #endif first_start = PL_multi_start; - s = scan_str(s,!!PL_madskills,FALSE,FALSE, FALSE); + s = scan_str(s,!!PL_madskills,FALSE,FALSE,FALSE,2); if (!s) { if (PL_lex_stuff) { SvREFCNT_dec(PL_lex_stuff); @@ -9791,7 +9792,7 @@ S_scan_trans(pTHX_ char *start) pl_yylval.ival = OP_NULL; - s = scan_str(start,!!PL_madskills,FALSE,FALSE, FALSE); + s = scan_str(start,!!PL_madskills,FALSE,FALSE,FALSE,0); if (!s) Perl_croak(aTHX_ "Transliteration pattern not terminated"); @@ -9807,7 +9808,7 @@ S_scan_trans(pTHX_ char *start) } #endif - s = scan_str(s,!!PL_madskills,FALSE,FALSE, FALSE); + s = scan_str(s,!!PL_madskills,FALSE,FALSE,FALSE,0); if (!s) { if (PL_lex_stuff) { SvREFCNT_dec(PL_lex_stuff); @@ -10259,7 +10260,7 @@ S_scan_inputsymbol(pTHX_ char *start) if (d - PL_tokenbuf != len) { pl_yylval.ival = OP_GLOB; - s = scan_str(start,!!PL_madskills,FALSE,FALSE, FALSE); + s = scan_str(start,!!PL_madskills,FALSE,FALSE,FALSE,1); if (!s) Perl_croak(aTHX_ "Glob not terminated"); return s; @@ -10365,6 +10366,7 @@ intro_sym: deprecate_escaped_meta issue a deprecation warning for cer- tain paired metacharacters that appear escaped within it + skip_cntrl 1 = skip over \c; 2 = skip if delim ne "'" returns: position to continue reading from buffer side-effects: multi_start, multi_close, lex_repl or lex_stuff, and updates the read buffer. @@ -10406,7 +10408,7 @@ intro_sym: STATIC char * S_scan_str(pTHX_ char *start, int keep_quoted, int keep_delims, int re_reparse, - bool deprecate_escaped_meta + bool deprecate_escaped_meta, int skip_cntrl ) { dVAR; @@ -10458,6 +10460,9 @@ S_scan_str(pTHX_ char *start, int keep_quoted, int keep_delims, int re_reparse, has_utf8 = TRUE; } + if (skip_cntrl == 2 && termcode == '\'') + skip_cntrl = 0; + /* mark where we are */ PL_multi_start = CopLINE(PL_curcop); PL_multi_open = term; @@ -10620,8 +10625,14 @@ S_scan_str(pTHX_ char *start, int keep_quoted, int keep_delims, int re_reparse, /* embedded newlines increment the current line number */ if (*s == '\n' && !PL_rsfp && !PL_parser->filtered) COPLINE_INC_WITH_HERELINES; + if (skip_cntrl && *s == '\\' && s+2 < PL_bufend && + term != '\\' && term != 'c' && s[1]=='c') + { + *to++ = *s++; /* \ */ + *to++ = *s++; /* c */ + } /* handle quoted delimiters */ - if (*s == '\\' && s+1 < PL_bufend && term != '\\') { + else if (*s == '\\' && s+1 < PL_bufend && term != '\\') { if (!keep_quoted && (s[1] == term || (re_reparse && s[1] == '\\')) ```
p5pRT commented 11 years ago

From @cpansprout

On Sun Aug 25 01​:16​:06 2013\, sprout wrote​:

You can see it on the sprout/cntrl branch\, and hereto attached.

There are actually two branches that need changing\, and that patch only changed one.

--

Father Chrysostomos