Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.96k stars 555 forks source link

out of memory during regex compilation #4074

Closed p5pRT closed 20 years ago

p5pRT commented 23 years ago

Migrated from rt.perl.org#7106 (status was 'resolved')

Searchable as RT7106$

p5pRT commented 23 years ago

From root@iface.co.uk

the following regex compiled and ran without problems under several previous versions but now generates an "Out of memory!" error message during compilation\, even if it is the only statement in a script​:

  $_[0] =~ s/^((?​:[^\n]*\n){11})( {8})@​@​.{39}/$1$2/m;

Perl Info ``` Site configuration information for perl 5.00503: Configured by drow at Sun Apr 30 12:07:23 EDT 2000. Summary of my perl5 (5.0 patchlevel 5 subversion 3) configuration: Platform: osname=linux, osvers=2.2.15pre14, archname=i386-linux uname='linux them 2.2.15pre14 #2 smp mon mar 13 14:29:00 est 2000 i686 unknown ' hint=recommended, useposix=true, d_sigaction=define usethreads=undef useperlio=undef d_sfio=undef Compiler: cc='cc', optimize='-O2 ', gccversion=2.95.2 20000313 (Debian GNU/Linux) cppflags='-Dbool=char -DHAS_BOOL -D_REENTRANT -DDEBIAN -I/usr/local/include' ccflags ='-Dbool=char -DHAS_BOOL -D_REENTRANT -DDEBIAN -I/usr/local/include' stdchar='char', d_stdstdio=undef, usevfork=false intsize=4, longsize=4, ptrsize=4, doublesize=8 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12 alignbytes=4, usemymalloc=n, prototype=define Linker and Libraries: ld='cc', ldflags =' -L/usr/local/lib' libpth=/usr/local/lib /lib /usr/lib libs=-lnsl -lndbm -lgdbm -ldbm -ldb -ldl -lm -lc -lposix -lcrypt libc=, so=so, useshrplib=false, libperl=libperl.a Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-rdynamic' cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib' Locally applied patches: @INC for perl 5.00503: /usr/lib/perl5/5.005/i386-linux /usr/lib/perl5/5.005 /usr/local/lib/site_perl/i386-linux /usr/local/lib/site_perl /usr/lib/perl5 . Environment for perl 5.00503: HOME=/u/fish LANG=C LANGUAGE (unset) LD_LIBRARY_PATH (unset) LOGDIR (unset) PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/X11R6/bin PERL_BADLANG (unset) SHELL=/bin/sh ********************************************************************** This email and any files transmitted with it are confidential and intended solely for the use of the individual or individuals to whom they are addressed. If you have received this email in error please notify the system manager. This message has been checked for the presence of computer viruses. ********************************************************************** ```
p5pRT commented 23 years ago

From [Unknown Contact. See original ticket]

This is fixed in perl5.6.1.

You probably want to replace @​@​ by \@​\@​\, as (I think) @​@​ will include array @​.

Robin

p5pRT commented 23 years ago

From [Unknown Contact. See original ticket]

No\, actually. "Punctuation" arrays aren't interpolated in doublequotish strings. A fact that doesn't seem to be documented.

In fact\, how arrays interpolate isn't documented at all (except implicitly in the description of $" in perlvar).

Patch below fixes this and other omissions\, and also mends some overlong lines.

Mike Guy

Inline Patch ```diff --- ./pod/perlop.pod.orig Sun May 6 15:24:51 2001 +++ ./pod/perlop.pod Tue Jun 12 13:10:01 2001 @@ -658,13 +658,15 @@ Customary Generic Meaning Interpolates '' q{} Literal no "" qq{} Literal yes - `` qx{} Command yes (unless '' is delimiter) + `` qx{} Command yes* qw{} Word list no - // m{} Pattern match yes (unless '' is delimiter) - qr{} Pattern yes (unless '' is delimiter) - s{}{} Substitution yes (unless '' is delimiter) + // m{} Pattern match yes* + qr{} Pattern yes* + s{}{} Substitution yes* tr{}{} Transliteration no (but see below) + * unless the delimiter is ''. + Non-bracketing delimiters use the same character fore and aft, but the four sorts of brackets (round, angle, square, curly) will all nest, which means that @@ -733,6 +735,15 @@ and although they often accept just C<"\012">, they seldom tolerate just C<"\015">. If you get in the habit of using C<"\n"> for networking, you may be burned some day. + +Subscripted variables such as C<$a[3]> or C<$href->{key}[0]> are also +interpolated, as are array and hash slices. But method calls +such as C<$obj->meth> are not interpolated. + +Interpolating an array or slice interpolates the elements in order, +separated by the value of C<$">, so is equivalent to interpolating +C. "Punctuation" arrays such C<@+> are not +interpolated. You cannot include a literal C<$> or C<@> within a C<\Q> sequence. An unescaped C<$> or C<@> interpolates the corresponding variable, End of patch ```
p5pRT commented 23 years ago

From [Unknown Contact. See original ticket]

Last things first​:

Sending as HTML only is perhaps a breach of etiquette​: I can't read such messages simply (and tend to regard them as spam).

Your bug message ended up at perl5-porters (see Cc​:) and it is usual for discussion of bugs to continue there​: I think the bug database keeps track of posting to perl5-porters if they contain the [ID ....] indicator. Best to keep perl5-porters copied in\, the individual respondents tend to accumulate on the To​:/Cc​: lines anyway.

Second things second​:

My reply was (perhaps confusing) making two distinct points. 1) You need perl5.6.1 to avoid this bug\, it is present in   perl5.005_03 (which you have) and perl5.6.0.

2) I thought that @​@​ might cause problems\, but it doesn't.   (For very obscure reasons).

To summarise (1) upgrade to perl5.6.1\, (2) don't change the regex.

First things last​:

We're not all on linux\, and some of us have only 20th century mail readers.

Robin

p5pRT commented 23 years ago

From @jhi

Thanks\, applied.

p5pRT commented 23 years ago

From @tamias

--- ./pod/perlop.pod.orig Sun May 6 15​:24​:51 2001 +++ ./pod/perlop.pod Tue Jun 12 13​:10​:01 2001

@​@​ -733\,6 +735\,15 @​@​ and although they often accept just C\<"\012">\, they seldom tolerate just C\<"\015">. If you get in the habit of using C\<"\n"> for networking\, you may be burned some day. + +Subscripted variables such as C\<$a[3]> or C\<$href->{key}[0]> are also +interpolated\, as are array and hash slices. But method calls +such as C\<$obj->meth> are not interpolated. + +Interpolating an array or slice interpolates the elements in order\, +separated by the value of C\<$">\, so is equivalent to interpolating +C\<join $"\, @​array>. "Punctuation" arrays such C\<@​+> are not +interpolated.

s/such/such as/;

However\, a little earlier in the doc\, around line 695\, is the sentence​:

  For constructs that do interpolate\, variables beginning with "C\<$>"   or "C\<@​>" are interpolated\, as are the following escape sequences.

And there's also this bit\, just after your new text​:

You cannot include a literal C\<$> or C\<@​> within a C\<\Q> sequence. An unescaped C\<$> or C\<@​> interpolates the corresponding variable\,

The section on interpolation may be a bit disorganized right now.

Ronald

p5pRT commented 23 years ago

From @tux

Which IMHO must be considered a bug to be solved\, and not to be documented\, so we end up being backward incompatible when finaly `fixed'

p5pRT commented 23 years ago

From [Unknown Contact. See original ticket]

Thanks. Corrected in the patch below.

However\, a little earlier in the doc\, around line 695\, is the sentence​:

For constructs that do interpolate\, variables beginning with "C\<$>" or "C\<@​>" are interpolated\, as are the following escape sequences.

I think this sentence is the cause of the disorganisation. See below.

And there's also this bit\, just after your new text​:

You cannot include a literal C\<$> or C\<@​> within a C\<\Q> sequence. An unescaped C\<$> or C\<@​> interpolates the corresponding variable\,

I put my new bit where it is precisely so that followed it naturally.

The section on interpolation may be a bit disorganized right now.

I certainly agree. I was trying to make the minimal change to include the new information. Probably a mistake.

The cause of the trouble is the sentence you quoted above. That introduces two different things - interpolation of variables *and* interpolation of escapes. Then there are several paragraphs on escapes. Then several on interpolation of variables (including my new stuff) which are a long way from the intro. So I suggest the fix of splitting that intro sentence.

Patch applies over my previous one.

Note also that I've corrected the "eleven" to (an implicit) "twelve". Am I right that \N{name} *is* interpolated in tr///? I'm not very well up on "use charnames".

Mike Guy

Inline Patch ```diff --- ./pod/perlop.pod.orig Tue Jun 12 13:10:01 2001 +++ ./pod/perlop.pod Tue Jun 12 18:44:07 2001 @@ -694,9 +694,8 @@ s {foo} # Replace foo {bar} # with bar. -For constructs that do interpolate, variables beginning with "C<$>" -or "C<@>" are interpolated, as are the following escape sequences. Within -a transliteration, the first eleven of these sequences may be used. +The following escape sequences are available in constructs that interpolate +and in transliterations. \t tab (HT, TAB) \n newline (NL) @@ -711,6 +710,9 @@ \c[ control char (ESC) \N{name} named char +The following escape sequences are available in constructs that interpolate +but not in transliterations. + \l lowercase next char \u uppercase next char \L lowercase till \E @@ -736,13 +738,14 @@ C<"\015">. If you get in the habit of using C<"\n"> for networking, you may be burned some day. -Subscripted variables such as C<$a[3]> or C<$href->{key}[0]> are also -interpolated, as are array and hash slices. But method calls -such as C<$obj->meth> are not interpolated. +For constructs that do interpolate, variables beginning with "C<$>" +or "C<@>" are interpolated. Subscripted variables such as C<$a[3]> or +C<$href->{key}[0]> are also interpolated, as are array and hash slices. +But method calls such as C<$obj->meth> are not. Interpolating an array or slice interpolates the elements in order, separated by the value of C<$">, so is equivalent to interpolating -C. "Punctuation" arrays such C<@+> are not +C. "Punctuation" arrays such as C<@+> are not interpolated. You cannot include a literal C<$> or C<@> within a C<\Q> sequence. End of patch ```
p5pRT commented 23 years ago

From [Unknown Contact. See original ticket]

I don't necessarily disgaree. I was just documenting what happens.

But I now find that my documentation wasn't complete. See third patch below. And since there is an easy of interpolating these\, I don't really see any need for "fixing".

Patch applies over my previous two.

Mike Guy

Inline Patch ```diff --- ./pod/perlop.pod.orig Tue Jun 12 18:44:07 2001 +++ ./pod/perlop.pod Tue Jun 12 19:15:45 2001 @@ -745,8 +745,8 @@ Interpolating an array or slice interpolates the elements in order, separated by the value of C<$">, so is equivalent to interpolating -C. "Punctuation" arrays such as C<@+> are not -interpolated. +C. "Punctuation" arrays such as C<@+> are only +interpolated if the name is enclosed in braces C<@{+}>. You cannot include a literal C<$> or C<@> within a C<\Q> sequence. An unescaped C<$> or C<@> interpolates the corresponding variable, End of patch ```
p5pRT commented 23 years ago

From @jhi

All three applied\, thanks.

p5pRT commented 23 years ago

From @abigail

On Tue\, Jun 12\, 2001 at 01​:14​:15PM +0100\, Mike Guy wrote​:

--- ./pod/perlop.pod.orig Sun May 6 15​:24​:51 2001 +++ ./pod/perlop.pod Tue Jun 12 13​:10​:01 2001 @​@​ -658\,13 +658\,15 @​@​ Customary Generic Meaning Interpolates '' q{} Literal no "" qq{} Literal yes - `` qx{} Command yes (unless '' is delimiter) + `` qx{} Command yes* qw{} Word list no - // m{} Pattern match yes (unless '' is delimiter) - qr{} Pattern yes (unless '' is delimiter) - s{}{} Substitution yes (unless '' is delimiter) + // m{} Pattern match yes* + qr{} Pattern yes* + s{}{} Substitution yes* tr{}{} Transliteration no (but see below)

+ * unless the delimiter is ''. + Non-bracketing delimiters use the same character fore and aft\, but the four sorts of brackets (round\, angle\, square\, curly) will all nest\, which means that @​@​ -733\,6 +735\,15 @​@​ and although they often accept just C\<"\012">\, they seldom tolerate just C\<"\015">. If you get in the habit of using C\<"\n"> for networking\, you may be burned some day. + +Subscripted variables such as C\<$a[3]> or C\<$href->{key}[0]> are also +interpolated\, as are array and hash slices. But method calls +such as C\<$obj->meth> are not interpolated. + +Interpolating an array or slice interpolates the elements in order\, +separated by the value of C\<$">\, so is equivalent to interpolating +C\<join $"\, @​array>. "Punctuation" arrays such C\<@​+> are not +interpolated.

Perhaps it should be added that "@​_" *does* get interpolated. It's debatable whether @​_ is an punctuation array or not - but if it isn't you should be able to my() it\, and you cannot.

Furthermore\, it is not the entire story\, @​? and @​{'?'} are two ways of addressing the same array\, and "@​{'?'}" does get interpolated.

Abigail