Error producing ^\ (chr 28) with "\c\\"

p5pRT commented 24 years ago

Migrated from rt.perl.org#1806 (status was 'resolved')

Searchable as RT1806$

p5pRT commented 24 years ago

From newton@ficus.frogspace.net

It appears to be impossible to produce a ^\ character (ASCII 28) using \c notation. "\x1c" and "\034" work fine\, but "\c\\" gives a two-character string (ASCII 28\, 92 i.e. ^\ followed by backslash) and "\c\" gives "Can't find string terminator '"' anywhere before EOF".

Examples:

$ perl -wle 'print join "\, "\, map ord\, split //\, "\c\"' Can't find string terminator '"' anywhere before EOF at -e line 1. $ perl -wle 'print join "\, "\, map ord\, split //\, "\c\\"' 28\, 92

"\c\\" whould be the correct syntax in my opinion; however\, it appears that the \c logic sees two backslashes -- that the two backslashes aren't first reduced to one (as per double-quoting usually) before \c sees it.

Perl Info

``` Site configuration information for perl 5.00503: Configured by frogleg at Sun Aug 8 13:32:51 EDT 1999. Summary of my perl5 (5.0 patchlevel 5 subversion 3) configuration: Platform: osname=linux, osvers=5.2, archname=i686-linux uname='linux ficus.frogspace.net 2.2.6-ac3 #2 thu aug 5 09:35:04 edt 1999 i686 unknown ' hint=recommended, useposix=true, d_sigaction=define usethreads=undef useperlio=undef d_sfio=undef Compiler: cc='gcc', optimize='-O2', gccversion=2.7.2.3 cppflags='-Dbool=char -DHAS_BOOL -I/usr/local/include' ccflags ='-Dbool=char -DHAS_BOOL -I/usr/local/include' stdchar='char', d_stdstdio=define, usevfork=false intsize=4, longsize=4, ptrsize=4, doublesize=8 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12 alignbytes=4, usemymalloc=n, prototype=define Linker and Libraries: ld='gcc', ldflags =' -L/usr/local/lib' libpth=/usr/local/lib /lib /usr/lib libs=-lnsl -lndbm -lgdbm -ldb -ldl -lm -lc -lposix -lcrypt libc=, so=so, useshrplib=false, libperl=libperl.a Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-rdynamic' cccdlflags='-fpic', lddlflags='-shared -L/usr/local/lib' Locally applied patches: @INC for perl 5.00503: /usr/local/lib/perl5/5.00503/i686-linux /usr/local/lib/perl5/5.00503 /usr/local/lib/perl5/site_perl/5.005/i686-linux /usr/local/lib/perl5/site_perl/5.005 . Environment for perl 5.00503: HOME=/home/newton LANG (unset) LANGUAGE (unset) LD_LIBRARY_PATH (unset) LOGDIR (unset) PATH=/usr/local/bin:/bin:/usr/bin:/usr/X11R6/bin:/usr/sbin PERL_BADLANG (unset) SHELL=/bin/bash ```

p5pRT commented 24 years ago

From [Unknown Contact. See original ticket]

\,\,\, writes:

"\c\\" whould be the correct syntax in my opinion; however\, it appears that the \c logic sees two backslashes -- that the two backslashes aren't first reduced to one (as per double-quoting usually) before \c sees it.

And what would "\c\c\c\c\\" do\, in your opinion? ;-)

perlop/"Gory details..."

Ilya

p5pRT commented 24 years ago

From [Unknown Contact. See original ticket]

(cc'ed to Ilya Zakharevich)

On Fri\, 19 Nov 1999\, Ilya Zakharevich wrote:

\,\,\, writes:

"\c\\" whould be the correct syntax in my opinion; however\, it appears that the \c logic sees two backslashes -- that the two backslashes aren't first reduced to one (as per double-quoting usually) before \c sees it.

And what would "\c\c\c\c\\" do\, in your opinion? ;-)

My guess is ^\ + 'c' + ^\ + 'c' + '\\'\, i.e. ASCII 28\, 99\, 28\, 99\, 92. And that's what Perl does.

This still doesn't explain to me\, though\, hough to produce a string consisting solely of the character ^\. -- Philip Newton \newton@newton\.digitalspace\.net

p5pRT commented 24 years ago

From [Unknown Contact. See original ticket]

On Fri\, Nov 19\, 1999 at 11:17:03AM -0500\, Philip Newton wrote:

And what would "\c\c\c\c\\" do\, in your opinion? ;-)

My guess is ^\ + 'c' + ^\ + 'c' + '\\'\, i.e. ASCII 28\, 99\, 28\, 99\, 92. And that's what Perl does.

So you want it to be parsed kleft-to-right. But you want \c\\ to be parsed right-to-left. Choose one.

This still doesn't explain to me\, though\, hough to produce a string consisting solely of the character ^\.

TIMTOWTDI. chr(ord('\\')-62) (or is it 32?) is one of them.

Ilya

p5pRT commented 24 years ago

From @ysth

In article \19991119124913\.F20768@monk\.mps\.ohio\-state\.edu\, Ilya Zakharevich \ilya@math\.ohio\-state\.edu wrote:

On Fri\, Nov 19\, 1999 at 11:17:03AM -0500\, Philip Newton wrote:

And what would "\c\c\c\c\\" do\, in your opinion? ;-)

My guess is ^\ + 'c' + ^\ + 'c' + '\\'\, i.e. ASCII 28\, 99\, 28\, 99\, 92. And that's what Perl does.

So you want it to be parsed kleft-to-right. But you want \c\\ to be parsed right-to-left. Choose one.

He stated he expected either "\c\\" or "\c\" to produce chr(28) and found that neither of them did. He then said that in his opinion "\c\\" should do it. I infer that what he *wants* is for either of them to do it.

[D:\susv2]perl -wlne "eval $_; print $@ if $@" print length "\c\\" 2 print length "\c\" Can't find string terminator '"' anywhere before EOF at (eval 2) line 1\, \<> chunk 2. exit

I lean toward considering the second of these a bug.

p5pRT commented 24 years ago

From [Unknown Contact. See original ticket]

Yitzchak Scott-Thoennes writes:

[D:\susv2]perl -wlne "eval $_; print $@ if $@" print length "\c\\" 2 print length "\c\" Can't find string terminator '"' anywhere before EOF at (eval 2) line 1\, \<> chunk 2. exit

I lean toward considering the second of these a bug.

Then read the docs.

Ilya

p5pRT commented 24 years ago

From @ysth

In article \199911210533\.AAA01763@monk\.mps\.ohio\-state\.edu\, Ilya Zakharevich \ilya@math\.ohio\-state\.edu wrote:

Yitzchak Scott-Thoennes writes:

[D:\susv2]perl -wlne "eval $_; print $@ if $@" print length "\c\\" 2 print length "\c\" Can't find string terminator '"' anywhere before EOF at (eval 2) line 1\, \<> chunk 2. exit

I lean toward considering the second of these a bug.

Then read the docs.

Thank you\, I already did when you gave the doc reference before.

I realise that this behavior is perfectly in accord with perlop/"Gory details"/"Finding the end". That doesn't mean it's not a bug. It just means that if it is a code bug it is also a doc bug.

IMO\, it also contradicts the beginning of perlop/"Gory details":

  When presented with something which may have several
  different interpretations\, Perl uses the principle DWIM
  \(expanded to Do What I Mean \- not what I wrote\) to pick up
  the most probable interpretation of the source\.

A: "\c\" B: "\c\"more string here"

The question is\, is the current incomprehesible behavior of string B more important to maintain than following DWIM for "string" A?

p5pRT commented 24 years ago

From [Unknown Contact. See original ticket]

Yitzchak Scott-Thoennes writes:

I realise that this behavior is perfectly in accord with perlop/"Gory details"/"Finding the end". That doesn't mean it's not a bug. It just means that if it is a code bug it is also a doc bug.

IMO\, it also contradicts the beginning of perlop/"Gory details":
  When presented with something which may have several
  different interpretations\, Perl uses the principle DWIM
  $expanded to Do What I Mean \- not what I wrote$ to pick up
  the most probable interpretation of the source\.

This is taken out of context.

A: "\c\" B: "\c\"more string here"

The question is\, is the current incomprehesible behavior of string B more important to maintain than following DWIM for "string" A?

Perl uses a very simple rule to find an end of a quoted construct. All one needs to do is to learn it.

Hope this help\, Ilya

p5pRT commented 24 years ago

From @ysth

Cc'd to: ilya@math.ohio-state.edu

In article \199911210756\.CAA02439@monk\.mps\.ohio\-state\.edu\, Ilya Zakharevich \ilya@math\.ohio\-state\.edu wrote:

Yitzchak Scott-Thoennes writes:
I realise that this behavior is perfectly in accord with perlop/"Gory details"/"Finding the end". That doesn't mean it's not a bug. It just means that if it is a code bug it is also a doc bug.

IMO\, it also contradicts the beginning of perlop/"Gory details":
  When presented with something which may have several
  different interpretations\, Perl uses the principle DWIM
  $expanded to Do What I Mean \- not what I wrote$ to pick up
  the most probable interpretation of the source\.
This is taken out of context.

I disagree. This reads to me like a "mission statement" for Perl parsing. If one of the many detailed parsing rules that follow violate this statement\, then that is an indication that a modification may be necessary.

A: "\c\" B: "\c\"more string here"

The question is\, is the current incomprehesible behavior of string B more important to maintain than following DWIM for "string" A?

Perl uses a very simple rule to find an end of a quoted construct. All one needs to do is to learn it.

Hope this help\, Ilya

Thanks\, I thought I already said I understand the docs and behavior for finding the end of a quoted construct are in agreement.

But application of these rules of finding the end\, left-to-right parse\, etc. leaves the \c\ construct without a clear meaning.

One of the following should be true in all cases: 1. \c\ produces a fatal exception 2. the second \ is a escape character and the following character is the 'argument' to \c 3. the \ is the 'argument' to \c and the following character is unaffected 4. the behavior is undefined and a warning is given

Currently none of these is true. The behavior is pretty close to number 3 except for a couple of odd cases:

[D:\]perl -wlne "print map {length\,':'\,map {;' '\,ord} split //} eval" "\c\\" 2: 28 92 "\c\"" 1: 98 exit

In the first case\, the string seems to contain two characters: \c\ and \. The anomaly is that the trailing backslash doesn't escape the ".

In the second case\, the string should IMO end with the 2nd ". Instead\, the 2nd \ of \c\ escapes it\, contrary to \c\ behavior elsewhere. Thus the string becomes \c" -> chr((ord('"')-64) & 127) -> b.

p5pRT commented 24 years ago

From [Unknown Contact. See original ticket]

On Sun\, Nov 21\, 1999 at 10:04:34AM -0800\, Yitzchak Scott-Thoennes wrote:

But application of these rules of finding the end\, left-to-right parse\, etc. leaves the \c\ construct without a clear meaning.

On the opposite\, they *give* \c\ a clear meaning: see qq[\c\c].

Ilya

p5pRT commented 24 years ago

From [Unknown Contact. See original ticket]

Yitzchak Scott-Thoennes \sthoenna@efn\.org writes:

A: "\c\" B: "\c\"more string here"

The question is\, is the current incomprehesible behavior of string B more important to maintain than following DWIM for "string" A?

Yes. print "this is \"important\"!\n";

-- Nick Ing-Simmons

p5pRT commented 24 years ago

From [Unknown Contact. See original ticket]

On Fri\, 19 Nov 1999\, Ilya Zakharevich wrote:

On Fri\, Nov 19\, 1999 at 11:17:03AM -0500\, Philip Newton wrote:

And what would "\c\c\c\c\\" do\, in your opinion? ;-)

My guess is ^\ + 'c' + ^\ + 'c' + '\\'\, i.e. ASCII 28\, 99\, 28\, 99\, 92. And that's what Perl does.

So you want it to be parsed left-to-right. But you want \c\\ to be parsed right-to-left. Choose one.

Not quite. I want '\\' to be translated to \ in a previous pass\, before the \c mechanism sees it. After the '\\' -> \ pass\, I want left-to-right.

This still doesn't explain to me\, though\, hough to produce a string consisting solely of the character ^\.

TIMTOWTDI. chr(ord('\\')-62) (or is it 32?) is one of them.

Well\, and "\0x1c" and "\034" work\, of course. I was just disappointed that ctrl-\ seems to be the only character that's difficult to produce in \c notation.

Cheers\, Philip

p5pRT commented 24 years ago

From [Unknown Contact. See original ticket]

On Tue\, Nov 23\, 1999 at 05:00:24AM -0500\, Philip Newton wrote:

So you want it to be parsed left-to-right. But you want \c\\ to be parsed right-to-left. Choose one.

Not quite. I want '\\' to be translated to \ in a previous pass\, before the \c mechanism sees it. After the '\\' -> \ pass\, I want left-to-right.

Then you are advising that "\\nc" and "\cc" should be parsed the same\, right?

Ilya

p5pRT commented 24 years ago

From @TimToady

Philip Newton writes: : Not quite. I want '\\' to be translated to \ in a previous pass\, before : the \c mechanism sees it. After the '\\' -> \ pass\, I want left-to-right.

It would have to be done in the same pass\, by pretending that \c is a funny kind of \. We go to great lengths to avoid doing multiple passes in Perl\, and when we do do multiple passes\, we go to great lengths to hide that fact. For instance\, we pretend that regular expressions are interpolated and interpreted just like double-quoted strings\, but in fact\, the lexer must treat them entirely differently to preserve that illusion\, because the regular expression parser does a separate pass after interpolation. Not only must the lexer pass backslashed sequences through to the regular expression parser\, but it has to decide which dollar signs indicate something to be interpolated immediately:

/$foo/

and which have to be passed through to the regular expression engine:

/foo$/ /(foo$|bar$)/ /(?{ $foo += 1 })/

Actually\, that last one doesn't need to pass $foo--it could conceivably just pass a pointer to some precompiled code\, but I don't think it does. If I recall\, it's more like an eval.

But anyway\, we don't cavalierly add multiple passes to Perl. Multiple passes tend to make things easier for the implementer\, but harder for the user. Perl's loyalties lie with the user.

Larry

p5pRT commented 24 years ago

From @ysth

Cc'd to: larry@wall.org

In article \199911231731\.JAA16685@kiev\.wall\.org\, Larry Wall \larry@wall\.org wrote:

It would have to be done in the same pass\, by pretending that \c is a funny kind of \.

At last\, someone with a glimmer of sense! \c *already* is a funny kind of \. Except with respect to whichever character indicates the end of the quoted string. This inconsistency is a *bug*.

For instance:

$foo = 'bar'; print "\c$foo"; yields 'dfoo'\, not an error $foo = 'bar'; print "\c\$foo"; yields chr(28).'bar'\, not 'dfoo'

From this\, "\c\" should yield chr(28). And "\c"" should yield 'b'\, just as qq'\c"' does.

Alternatively\, \c\ should always apply the \c 'operator' to the following character\, so that "\c\"" works like qq'\c"' (as it currently does) but it takes \c\\ to get a chr(28).

p5pRT commented 24 years ago

From @ysth

Cc'd to: nick@ing-simmons.net

In article \199911212139\.VAA03794@bactrian\.ni\-s\.u\-net\.com\, Nick Ing-Simmons \nick@ing\-simmons\.net wrote:

Yitzchak Scott-Thoennes \sthoenna@efn\.org writes:

A: "\c\" B: "\c\"more string here"

The question is\, is the current incomprehesible behavior of string B more important to maintain than following DWIM for "string" A?

Yes. print "this is \"important\"!\n";

$what='this'; print "\c$what do I print?"; print "\c\$what do I print?";

And why? When you understand what I am talking about\, feel free to comment.

p5pRT commented 24 years ago

From [Unknown Contact. See original ticket]

On Tue\, 23 Nov 1999\, Ilya Zakharevich wrote:

On Tue\, Nov 23\, 1999 at 05:00:24AM -0500\, Philip Newton wrote:

So you want it to be parsed left-to-right. But you want \c\\ to be parsed right-to-left. Choose one.

Not quite. I want '\\' to be translated to \ in a previous pass\, before the \c mechanism sees it. After the '\\' -> \ pass\, I want left-to-right.

Then you are advising that "\\nc" and "\cc" should be parsed the same\, right?

I don't get it. The first I would parse as three characters: backslash (the two backslashes become one)\, n\, c. The second as one character: ctrl-C ("\x03").

Cheers\, Philip

p5pRT commented 24 years ago

From [Unknown Contact. See original ticket]

On Tue\, 23 Nov 1999\, Larry Wall wrote:

Philip Newton writes: : Not quite. I want '\\' to be translated to \ in a previous pass\, before : the \c mechanism sees it. After the '\\' -> \ pass\, I want left-to-right.

It would have to be done in the same pass\, by pretending that \c is a funny kind of \.

Is this then where the current problem comes from? "\c\\" gets parsed left-to-right\, and if \c is a funny escape sign\, then we have the "token" \c + backslash\, followed by another backslash. \c + backslash is converted to ^\\, and the final backslash stays chr(92). The parser (lexer?) never sees "\\" to convert to one backslash because the first backslash is already eaten by the \c "escape".

Maybe some magic\, then\, which makes "\c\\" into one token\, which gets eaten whole by the "maximal munch" strategy?

Cheers\, Philip -- Philip Newton \newton@newton\.digitalspace\.net

p5pRT commented 24 years ago

From [Unknown Contact. See original ticket]

On Wed\, Nov 24\, 1999 at 03:10:58AM -0500\, Philip Newton wrote:

So you want it to be parsed left-to-right. But you want \c\\ to be parsed right-to-left. Choose one.

Not quite. I want '\\' to be translated to \ in a previous pass\, before the \c mechanism sees it. After the '\\' -> \ pass\, I want left-to-right.

Then you are advising that "\\nc" and "\cc" should be parsed the same\, right?

It should have been\, of course\, \\cc vs \cc

I don't get it. The first I would parse as three characters: backslash (the two backslashes become one)\, n\, c. The second as one character: ctrl-C ("\x03").

Nope. You want two backslashes converted to one *before* \c interpolation is done. This \\cc will behave the same as \cc.

Ilya

p5pRT commented 24 years ago

From [Unknown Contact. See original ticket]

On Wed\, 24 Nov 1999\, Ilya Zakharevich wrote:

On Wed\, Nov 24\, 1999 at 03:10:58AM -0500\, Philip Newton wrote:

So you want it to be parsed left-to-right. But you want \c\\ to be parsed right-to-left. Choose one.

Not quite. I want '\\' to be translated to \ in a previous pass\, before the \c mechanism sees it. After the '\\' -> \ pass\, I want left-to-right.

Then you are advising that "\\nc" and "\cc" should be parsed the same\, right?

It should have been\, of course\, \\cc vs \cc

OK. I see what you mean now. No\, I suppose I don't want that. I suppose what I want was expressed\, more or less\, by Yitzchak Scott-Thoennes elsewhere in this thread. Something along the lines of having \-conversion (\n \f etc.) done in parallel with \c-conversion\, and having \c\ do roughly the same thing as \c if the following character is either a backslash or the closing delimiter\, e.g. "\c\\"\, "\c\""\, qq^\c\^^.

Cheers\, Philip -- Philip Newton \newton@newton\.digitalspace\.net

p5pRT commented 24 years ago

From [Unknown Contact. See original ticket]

Philip Newton (lists.p5p):

OK. I see what you mean now. No\, I suppose I don't want that. I suppose what I want was expressed\, more or less\, by Yitzchak Scott-Thoennes elsewhere in this thread. Something along the lines of having \-conversion (\n \f etc.) done in parallel with \c-conversion\, and having \c\ do roughly the same thing as \c if the following character is either a backslash or the closing delimiter\, e.g. "\c\\"\, "\c\""\, qq^\c\^^.

This now makes sense\, and that's roughly how I had thunk it should go. At least this way is relatively easy to implement: special-case \c\[something] then fall back to parsing from \c if there wasn't a following backslash - that way we could get it all in one left-right pass\, which seems the most intuitive\, even if it isn't.

It strikes me as being the solution most mentally compatible with the rest of Perl's escaping/metacharactering. Feel free to violently disagree.

To save someone the bother of bringing up the degenerate case\, what the heck should \c\cx do?

-- Q: How many IBM CPU's does it take to execute a job? A: Four; three to hold it down\, and one to rip its head off.

p5pRT commented 24 years ago

From [Unknown Contact. See original ticket]

On Thu\, Nov 25\, 1999 at 04:55:59AM -0500\, Philip Newton wrote:

OK. I see what you mean now. No\, I suppose I don't want that. I suppose what I want was expressed\, more or less\, by Yitzchak Scott-Thoennes elsewhere in this thread. Something along the lines of having \-conversion (\n \f etc.) done in parallel with \c-conversion\, and having \c\ do roughly the same thing as \c if the following character is either a backslash or the closing delimiter\, e.g. "\c\\"\, "\c\""\, qq^\c\^^.

you are missing the *most important* point again: closing " is found first.

Ilya

p5pRT commented 24 years ago

From [Unknown Contact. See original ticket]

Simon Cozens writes:

This now makes sense\, and that's roughly how I had thunk it should go. At least this way is relatively easy to implement: special-case \c\[something] then fall back to parsing from \c if there wasn't a following backslash - that way we could get it all in one left-right pass\, which seems the most intuitive\, even if it isn't.

As Larry explains\, it breaks many other expectations one has about quoting. Currently

$a = '\c\\'; /$a/

does what one expects. Your change will break it.

Ilya

p5pRT commented 24 years ago

From [Unknown Contact. See original ticket]

On Fri\, 26 Nov 1999\, Ilya Zakharevich wrote:

On Thu\, Nov 25\, 1999 at 04:55:59AM -0500\, Philip Newton wrote:

Something along the lines of having \-conversion (\n \f etc.) done in parallel with \c-conversion\, and having \c\ do roughly the same thing as \c if the following character is either a backslash or the closing delimiter\, e.g. "\c\\"\, "\c\""\, qq^\c\^^.

you are missing the *most important* point again: closing " is found first.

Yes\, but according to perlop/Gory Details\, while searching for the closing " of "" (or closing ^ of qq^^\, etc.)\, the combinations \\ and \" (or \^\, etc.) are skipped. Hence\, according to my understanding\, "\c\\" (after the first "pass"\, finding the end) turns into >>\c\\ inside ""\<\<; "\c\"" into

\c\" inside ""\<\<; and qq^\c\^^ into >>\c\^ inside qq^^\<\<.

After this\, backslash+delimiter are turned into plain delimiter (while backslash+backslash is kept)\, and knowledge of the original delimiter is lost\, so the three strings become \c\\\, \c" and \c^\, respectively.

Hmmm\, I think I begin to understand. No change is needed for control-(closing delimiter)\, since the interpolation step doesn't see backslashes there any more. However\, I still believe that handling of \c\\ should be changed to produce ctrl-\ instead of ctrl-\\, \.

Cheers\, Philip -- Philip Newton \newton@newton\.digitalspace\.net

p5pRT commented 24 years ago

From [Unknown Contact. See original ticket]

On Fri\, Nov 26\, 1999 at 07:26:36AM -0500\, Philip Newton wrote:

you are missing the *most important* point again: closing " is found first.

Yes\, but according to perlop/Gory Details\, while searching for the closing " of "" (or closing ^ of qq^^\, etc.)\, the combinations \\ and \" (or \^\, etc.) are skipped. Hence\, according to my understanding\, "\c\\" (after the first "pass"\, finding the end) turns into >>\c\\ inside ""\<\<; "\c\"" into

\c\" inside ""\<\<; and qq^\c\^^ into >>\c\^ inside qq^^\<\<.

After this\, backslash+delimiter are turned into plain delimiter (while backslash+backslash is kept)\, and knowledge of the original delimiter is lost\, so the three strings become \c\\\, \c" and \c^\, respectively.

This is how I would expect things work.

Hmmm\, I think I begin to understand. No change is needed for control-(closing delimiter)\, since the interpolation step doesn't see backslashes there any more. However\, I still believe that handling of \c\\ should be changed to produce ctrl-\ instead of ctrl-\\, \.

This will break compatibility with

$x = 'whatever'; /$x/;

It is not win-win situation. Being such\, I do not feel any urge to have it changed.

Ilya

p5pRT commented 24 years ago

From [Unknown Contact. See original ticket]

On Sat\, 27 Nov 1999\, Ilya Zakharevich wrote:

On Fri\, Nov 26\, 1999 at 07:26:36AM -0500\, Philip Newton wrote:

However\, I still believe that handling of \c\\ should be changed to produce ctrl-\ instead of ctrl-\\, \.

This will break compatibility with

$x = 'whatever'; /$x/;

I do not understand. Please provide a concrete example.

Cheers\, Philip -- Philip Newton \newton@newton\.digitalspace\.net

p5pRT commented 17 years ago

From guido@imperia.net

Hi!

My perl version is 5.8.8. In the following I use vi notation for control characters in strings\, ie. CTRL-J is ^J.

When I wanted to include CTRL-\ in a double quoted string with \c escapes\, I first tried the obvious solution for me:

$ print "a\c\\nb"; a^J b

It turns out that this yields ``a^\^Jb''. In other words: \c consumes exactly the next byte/character even if it is a backslash. This leads to the interesting question how a string ending in CTRL-\ can be written:

print "...\c\";

This will not work\, because the backslash escapes the trailing quote.

print "...\c\\";

This compiles but produces the string ``...^\\'' instead of ``...^\''\, i.e. there is a gratuitous trailing backslash.

Of course\, I can use other means like octal or hexadecimal escapes. But I think that this shows an inconsistency. Take the double quoted string "...\c\\". The tokenizer takes the last backslash as is\, escaped by the one before as the escaping character. The unescaper takes the last but one backslash as is and silently tolerates the lone trailing backslash. The double quoted string looks different from inside than from outside.

IMHO the correct solution would be to represent CTRL-\ as "\c\\". At the end of the double quoted string and everywhere else. Breaking compatibility here is probably acceptable. The case is really esoteric.

Cheers\, Guido -- Imperia AG\, Development Leyboldstr. 10 - D-50354 Hürth - http://www.imperia.net/

Perl / perl5