Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.9k stars 540 forks source link

Documentation patch: consistently say 'lookahead' not 'look-ahead' #15040

Closed p5pRT closed 8 years ago

p5pRT commented 8 years ago

Migrated from rt.perl.org#126608 (status was 'resolved')

Searchable as RT126608$

p5pRT commented 8 years ago

From @epa

Created by @epa

The pod documentation (particularly perlre) sometimes says 'lookahead' and sometimes 'look-ahead'. This makes searching harder. A quick survey shows that the form without the hyphen is more frequent​:

% grep -ri lookbehind | wc -l 84 % grep -ri look-behind | wc -l 21 % grep -ri lookahead | wc -l 183 % grep -ri look-ahead | wc -l 22 % grep -ri lookaround | wc -l 5 % grep -ri look-around | wc -l 4

FWIW\, the Camel book also prefers the form without hyphen. This patch makes pod documentation always use 'lookahead'\, 'lookbehind' and 'lookaround'. It also changes some comments.

Inline Patch ```diff diff --git a/cpan/Encode/encengine.c b/cpan/Encode/encengine.c index bddf556..11b6298 100644 --- a/cpan/Encode/encengine.c +++ b/cpan/Encode/encengine.c @@ -79,7 +79,7 @@ will provide the actual output and set tables back to original base page. This scheme can also handle shift encodings. -A slight enhancement to the scheme also allows for look-ahead - if +A slight enhancement to the scheme also allows for lookahead - if we add a flag to re-add the removed byte to the source we could handle a" -> U+00E4 (LATIN SMALL LETTER A WITH DIAERESIS) ab -> a (and take b back please) diff --git a/cpan/Pod-Simple/lib/Pod/Simple/BlackBox.pm b/cpan/Pod-Simple/lib/Pod/Simple/BlackBox.pm index 7021e6c..0c3667c 100644 --- a/cpan/Pod-Simple/lib/Pod/Simple/BlackBox.pm +++ b/cpan/Pod-Simple/lib/Pod/Simple/BlackBox.pm @@ -1822,7 +1822,7 @@ sub _treelet_from_formatting_codes { # * Closing brackets. Match some amount of whitespace followed by # multiple close brackets. The logic to see if this closes anything # is down below. Note that in order to parse C<< >> correctly, we - # have to use look-behind (?<=\s\s), since the match of the starting + # have to use lookbehind (?<=\s\s), since the match of the starting # code will have consumed the whitespace. # # * A single closing bracket, to close a simple code like C<>. diff --git a/cpan/Test-Harness/lib/TAP/Parser/YAMLish/Reader.pm b/cpan/Test-Harness/lib/TAP/Parser/YAMLish/Reader.pm index a79f728..80b4905 100644 --- a/cpan/Test-Harness/lib/TAP/Parser/YAMLish/Reader.pm +++ b/cpan/Test-Harness/lib/TAP/Parser/YAMLish/Reader.pm @@ -43,7 +43,7 @@ sub read { # The terminator is mandatory otherwise we'd consume a line from the # iterator that doesn't belong to us. If we want to remove this - # restriction we'll have to implement look-ahead in the iterators. + # restriction we'll have to implement lookahead in the iterators. # Which might not be a bad idea. my $dots = $self->_peek; die "Missing '...' at end of YAMLish" diff --git a/cpan/Text-Tabs/lib/Text/Wrap.pm b/cpan/Text-Tabs/lib/Text/Wrap.pm index db0d15f..dfbbac1 100644 --- a/cpan/Text-Tabs/lib/Text/Wrap.pm +++ b/cpan/Text-Tabs/lib/Text/Wrap.pm @@ -214,7 +214,7 @@ default is simply C<'\s'>; that is, words are terminated by spaces. (This means, among other things, that trailing punctuation such as full stops or commas stay with the word they are "attached" to.) Setting C<$Text::Wrap::break> to a regular expression that doesn't -eat any characters (perhaps just a forward look-ahead assertion) will +eat any characters (perhaps just a forward lookahead assertion) will cause warnings. Beginner note: In example 2, above C<$columns> is imported into diff --git a/cpan/perlfaq/lib/perlfaq6.pod b/cpan/perlfaq/lib/perlfaq6.pod index c889ca4..ff3f610 100644 --- a/cpan/perlfaq/lib/perlfaq6.pod +++ b/cpan/perlfaq/lib/perlfaq6.pod @@ -1016,7 +1016,7 @@ Or like this: } Here's another, slightly less painful, way to do it from Benjamin -Goldberg, who uses a zero-width negative look-behind assertion. +Goldberg, who uses a zero-width negative lookbehind assertion. print "found GX!\n" if $martian =~ m/ (?> and the group named C<< b >> are aliases for the group belonging to C<< $1 >>. -=item Look-Around Assertions -X X X X +=item "Look" . lc Around Assertions +X X X X -Look-around assertions are zero-width patterns which match a specific +Lookaround assertions are zero-width patterns which match a specific pattern without including it in C<$&>. Positive assertions match when their subpattern matches, negative assertions match when their subpattern -fails. Look-behind matches text up to the current match position, -look-ahead matches text following the current match position. +fails. Lookbehind matches text up to the current match position, +lookahead matches text following the current match position. =over 4 =item C<(?=pattern)> -X<(?=)> X X +X<(?=)> X X -A zero-width positive look-ahead assertion. For example, C +A zero-width positive lookahead assertion. For example, C matches a word followed by a tab, without including the tab in C<$&>. =item C<(?!pattern)> -X<(?!)> X X +X<(?!)> X X -A zero-width negative look-ahead assertion. For example C +A zero-width negative lookahead assertion. For example C matches any occurrence of "foo" that isn't followed by "bar". Note -however that look-ahead and look-behind are NOT the same thing. You cannot -use this for look-behind. +however that lookahead and lookbehind are NOT the same thing. You cannot +use this for lookbehind. If you are looking for a "bar" that isn't preceded by a "foo", C will not do what you want. That's because the C<(?!foo)> is just saying that the next thing cannot be "foo"--and it's not, it's a "bar", so "foobar" will -match. Use look-behind instead (see below). +match. Use lookbehind instead (see below). =item C<(?<=pattern)> C<\K> -X<(?<=)> X X X<\K> +X<(?<=)> X X X<\K> -A zero-width positive look-behind assertion. For example, C +A zero-width positive lookbehind assertion. For example, C matches a word that follows a tab, without including the tab in C<$&>. -Works only for fixed-width look-behind. +Works only for fixed-width lookbehind. There is a special form of this construct, called C<\K> (available since Perl 5.10.0), which causes the regex engine to "keep" everything it had matched prior to the C<\K> and not include it in C<$&>. This effectively provides variable-length -look-behind. The use of C<\K> inside of another look-around assertion +lookbehind. The use of C<\K> inside of another lookaround assertion is allowed, but the behaviour is currently not well defined. For various reasons C<\K> may be significantly more efficient than the @@ -1298,11 +1298,11 @@ can be rewritten as the much more efficient s/foo\Kbar//g; =item C<(? -X<(? X X +X<(? X X -A zero-width negative look-behind assertion. For example C +A zero-width negative lookbehind assertion. For example C matches any occurrence of "foo" that does not follow "bar". Works -only for fixed-width look-behind. +only for fixed-width lookbehind. =back @@ -1653,7 +1653,7 @@ C<(condition)> should be one of: (which is valid if the corresponding pair of parentheses matched); -=item a look-ahead/look-behind/evaluate zero-width assertion; +=item a lookahead/lookbehind/evaluate zero-width assertion; =item a name in angle brackets or single quotes @@ -1839,7 +1839,7 @@ the C pragma or B<-w> switch saying it C<"matches null string many times in regex">. On simple groups, such as the pattern C<< (?> [^()]+ ) >>, a comparable -effect may be achieved by negative look-ahead, as in C<[^()]+ (?! [^()] )>. +effect may be achieved by negative lookahead, as in C<[^()]+ (?! [^()] )>. This was only 4 times slower on a string with 1000000 Cs. The "grab all you can, and do not give anything back" semantic is desirable @@ -2242,7 +2242,7 @@ definition might succeed against a particular string. And if there are multiple ways it might succeed, you need to understand backtracking to know which variety of success you will achieve. -When using look-ahead assertions and negations, this can all get even +When using lookahead assertions and negations, this can all get even trickier. Imagine you'd like to find a sequence of non-digits not followed by "123". You might try to write that as @@ -2292,7 +2292,7 @@ time. Now there's indeed something following "AB" that is not We can deal with this by using both an assertion and a negation. We'll say that the first part in C<$1> must be followed both by a digit -and by something that's not "123". Remember that the look-aheads +and by something that's not "123". Remember that the lookaheads are zero-width expressions--they only look, but don't consume any of the string in their match. So rewriting this way produces what you'd expect; that is, case 5 will fail, but case 6 succeeds: @@ -2329,10 +2329,10 @@ match takes a long time to finish. A powerful tool for optimizing such beasts is what is known as an "independent group", which does not backtrack (see Lpattern) >>>). Note also that -zero-length look-ahead/look-behind assertions will not backtrack to make +zero-length lookahead/lookbehind assertions will not backtrack to make the tail match, since they are in "logical" context: only whether they match is considered relevant. For an example -where side-effects of look-ahead I have influenced the +where side-effects of lookahead I have influenced the following match, see Lpattern) >>>. =head2 Version 8 Regular Expressions diff --git a/pod/perlreref.pod b/pod/perlreref.pod index e9b784e..db7c173 100644 --- a/pod/perlreref.pod +++ b/pod/perlreref.pod @@ -252,10 +252,10 @@ There is no quantifier C<{,n}>. That's interpreted as a literal string. (?P>name) Recurse into a named subpattern (python syntax) (?(cond)yes|no) (?(cond)yes) Conditional expression, where "cond" can be: - (?=pat) look-ahead - (?!pat) negative look-ahead - (?<=pat) look-behind - (?) named subpattern has matched something ('name') named subpattern has matched something diff --git a/pod/perlunicode.pod b/pod/perlunicode.pod index a407faf..2929163 100644 --- a/pod/perlunicode.pod +++ b/pod/perlunicode.pod @@ -1110,7 +1110,7 @@ feature, you can use one of the following: =item * -Regular expression look-ahead +Regular expression lookahead You can mimic class subtraction using lookahead. For example, what UTS#18 might write as @@ -1223,7 +1223,7 @@ Level 3 - Tailored Support [17] see UAX#10 "Unicode Collation Algorithms" [18] have Unicode::Collate but not integrated to regexes - [19] have (?<=x) and (?=x), but look-aheads or look-behinds + [19] have (?<=x) and (?=x), but lookaheads or lookbehinds should see outside of the target substring [20] need insensitive matching for linguistic features other than case; for example, hiragana to katakana, wide and diff --git a/regcomp.c b/regcomp.c index df60d1b..6554a1c 100644 --- a/regcomp.c +++ b/regcomp.c @@ -9978,7 +9978,7 @@ S_reg(pTHX_ RExC_state_t *pRExC_state, I32 paren, I32 *flagp,U32 depth) RExC_parse++; paren = *RExC_parse++; - ret = NULL; /* For look-ahead/behind. */ + ret = NULL; /* For lookahead/behind. */ switch (paren) { case 'P': /* (?P...) variants for those used to PCRE/Python */ diff --git a/regexec.c b/regexec.c index 85c31a6..a49ce7b 100644 --- a/regexec.c +++ b/regexec.c @@ -654,7 +654,7 @@ Perl_re_intuit_start(pTHX_ "Intuit: trying to determine minimum start position...\n")); /* for now, assume that all substr offsets are positive. If at some point - * in the future someone wants to do clever things with look-behind and + * in the future someone wants to do clever things with lookbehind and * -ve offsets, they'll need to fix up any code in this function * which uses these offsets. See the thread beginning * <20140113145929.GF27210@iabyn.com> @@ -2683,7 +2683,7 @@ S_reg_set_capture_string(pTHX_ REGEXP * const rx, U32 n = 0; max = -1; /* calculate the right-most part of the string covered - * by a capture. Due to look-ahead, this may be to + * by a capture. Due to lookahead, this may be to * the right of $&, so we have to scan all captures */ while (n <= prog->lastparen) { if (prog->offs[n].end > max) @@ -2704,7 +2704,7 @@ S_reg_set_capture_string(pTHX_ REGEXP * const rx, U32 n = 0; min = max; /* calculate the left-most part of the string covered - * by a capture. Due to look-behind, this may be to + * by a capture. Due to lookbehind, this may be to * the left of $&, so we have to scan all captures */ while (min && n <= prog->lastparen) { if ( prog->offs[n].start != -1 ```
Perl Info ``` Flags: category=docs severity=low Site configuration information for perl 5.20.3: Configured by Red Hat, Inc. at Thu Sep 24 08:45:26 UTC 2015. Summary of my perl5 (revision 5 version 20 subversion 3) configuration: Platform: osname=linux, osvers=4.1.6-100.fc21.x86_64, archname=x86_64-linux-thread-multi uname='linux buildvm-04.phx2.fedoraproject.org 4.1.6-100.fc21.x86_64 #1 smp mon aug 17 22:20:37 utc 2015 x86_64 x86_64 x86_64 gnulinux ' config_args='-des -Doptimize=-O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -Dccdlflags=-Wl,--enable-new-dtags -Dlddlflags=-shared -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -Wl,-z,relro -Dshrpdir=/usr/lib64 -DDEBUGGING=-g -Dversion=5.20.3 -Dmyhostname=localhost -Dperladmin=root@localhost -Dcc=gcc -Dcf_by=Red Hat, Inc. -Dprefix=/usr -Dvendorprefix=/usr -Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl5 -Dsitearch=/usr/local/lib64/perl5 -Dprivlib=/usr/share/perl5 -Dvendorlib=/usr/share/perl5/vendor_perl -Darchlib=/usr/lib64/perl5 -Dvendorarch=/usr/lib64/perl5/vendor_perl -Darchname=x86_64-linux-thread-multi -Dlibpth=/usr/local/lib64 /lib64 /usr/lib64 -Duseshrplib -Dusethreads -Duseithreads -Dusedtrace=/usr/bin/dtrace -Duselargefiles -Dd_semctl_semun -Di_db -Ui_ndbm -Di_gdbm -Di_shadow -Di_syslog -Dman3ext=3pm -Duseperlio -Dinstallusrbinperl=n -Ubincompat5005 -Uversiononly -Dpager=/usr/bin/less -isr -Dd_gethostent_r_proto -Ud_endhostent_r_proto -Ud_sethostent_r_proto -Ud_endprotoent_r_proto -Ud_setprotoent_r_proto -Ud_endservent_r_proto -Ud_setservent_r_proto -Dscriptdir=/usr/bin -Dusesitecustomize' hint=recommended, useposix=true, d_sigaction=define useithreads=define, usemultiplicity=define use64bitint=define, use64bitall=define, uselongdouble=undef usemymalloc=n, bincompat5005=undef Compiler: cc='gcc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -fwrapv -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64', optimize='-O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic', cppflags='-D_REENTRANT -D_GNU_SOURCE -fwrapv -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include' ccversion='', gccversion='5.1.1 20150618 (Red Hat 5.1.1-4)', gccosandvers='' intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16 ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8 alignbytes=8, prototype=define Linker and Libraries: ld='gcc', ldflags =' -fstack-protector -L/usr/local/lib' libpth=/usr/local/lib64 /lib64 /usr/lib64 /usr/local/lib /usr/lib /lib/../lib64 /usr/lib/../lib64 /lib libs=-lpthread -lresolv -lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lc -lgdbm_compat perllibs=-lpthread -lresolv -lnsl -ldl -lm -lcrypt -lutil -lc libc=libc-2.21.so, so=so, useshrplib=true, libperl=libperl.so gnulibc_version='2.21' Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,--enable-new-dtags' cccdlflags='-fPIC', lddlflags='-shared -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -Wl,-z,relro -L/usr/local/lib' Locally applied patches: Fedora Patch1: Removes date check, Fedora/RHEL specific Fedora Patch3: support for libdir64 Fedora Patch4: use libresolv instead of libbind Fedora Patch5: USE_MM_LD_RUN_PATH Fedora Patch6: Skip hostname tests, due to builders not being network capable Fedora Patch7: Dont run one io test due to random builder failures Fedora Patch15: Define SONAME for libperl.so Fedora Patch16: Install libperl.so to -Dshrpdir value Fedora Patch22: Document Math::BigInt::CalcEmu requires Math::BigInt (CPAN RT#85015) Fedora Patch25: Use stronger algorithm needed for FIPS in t/op/crypt.t (RT#121591) Fedora Patch26: Make *DBM_File desctructors thread-safe (RT#61912) Fedora Patch27: Report inaccesible file on failed require (RT#123270) Fedora Patch28: Use stronger algorithm needed for FIPS in t/op/taint.t (RT#123338) Fedora Patch29: Fix debugger y command scope level Fedora Patch200: Link XS modules to libperl.so with EU::CBuilder on Linux Fedora Patch201: Link XS modules to libperl.so with EU::MM on Linux @INC for perl 5.20.3: /home/eda/share/perl5 /home/eda/lib/perl5/ /home/eda/lib64/perl5/ /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 . Environment for perl 5.20.3: HOME=/home/eda LANG=en_GB.UTF-8 LANGUAGE (unset) LC_COLLATE=C LC_CTYPE=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8 LC_MONETARY=en_GB.UTF-8 LC_NUMERIC=en_GB.UTF-8 LC_TIME=en_GB.UTF-8 LD_LIBRARY_PATH (unset) LOGDIR (unset) PATH=/home/eda/bin:/home/eda/bin:/home/eda/bin:/usr/local/bin:/usr/bin:/sbin:/usr/sbin:/sbin:/usr/sbin:/sbin:/usr/sbin PERL5LIB=/home/eda/share/perl5:/home/eda/lib/perl5/:/home/eda/lib64/perl5/ PERL_BADLANG (unset) SHELL=/bin/bash Please ignore autogenerated disclaimer below this point. This email is intended only for the person to whom it is addressed and may contain confidential information. Any retransmission, copying, disclosure or other use of, this information by persons other than the intended recipient is prohibited. If you received this email in error, please contact the sender and delete the material. This email is for information only and is not intended as an offer or solicitation for the purchase or sale of any financial instrument. Wadhwani Asset Management LLP is a Limited Liability Partnership registered in England (OC303168) with registered office at 40 Berkeley Square, 3rd Floor, London, W1J 5AL. It is authorised and regulated by the Financial Conduct Authority. ```
p5pRT commented 8 years ago

From @epa

I see that the patch changes some \X anchors\, which is not intentional (both spellings can appear in \X). Let me make a new patch.

p5pRT commented 8 years ago

The RT System itself - Status changed from 'new' to 'open'

p5pRT commented 8 years ago

From @epa

New patch fixing a couple of errors in the original.

p5pRT commented 8 years ago

From @epa

lookahead_no_hyphen.diff

p5pRT commented 8 years ago

From @tonycoz

On Tue Nov 10 02​:48​:54 2015\, eda@​waniasset.com wrote​:

The pod documentation (particularly perlre) sometimes says 'lookahead' and sometimes 'look-ahead'. This makes searching harder. A quick survey shows that the form without the hyphen is more frequent​:

% grep -ri lookbehind | wc -l 84 % grep -ri look-behind | wc -l 21 % grep -ri lookahead | wc -l 183 % grep -ri look-ahead | wc -l 22 % grep -ri lookaround | wc -l 5 % grep -ri look-around | wc -l 4

FWIW\, the Camel book also prefers the form without hyphen. This patch makes pod documentation always use 'lookahead'\, 'lookbehind' and 'lookaround'. It also changes some comments.

pod/perlre.pod is pretty consistently using the hyphen versions\, which I prefer.

diff --git a/cpan/Encode/encengine.c b/cpan/Encode/encengine.c ... diff --git a/cpan/Test-Harness/lib/TAP/Parser/YAMLish/Reader.pm b/cpan/Test-Harness/lib/TAP/Parser/YAMLish/Reader.pm ... diff --git a/cpan/Text-Tabs/lib/Text/Wrap.pm b/cpan/Text-Tabs/lib/Text/Wrap.pm ... diff --git a/cpan/perlfaq/lib/perlfaq6.pod b/cpan/perlfaq/lib/perlfaq6.pod ...

These would need to go through their CPAN upstreams.

Tony

p5pRT commented 8 years ago

From @epa

Here are two patches. One changes consistently to 'lookahead'\, and the other consistently to 'look-ahead'. The p5-porters can apply which of the two they prefer.

FTR\, these were generated with respectively

perl -i -pE 's/(?\<!X\<)(look)-(behind|ahead|around)/$1 . lc $2/gei' `ack -li look- | grep -v cpan/`

perl -i -pE 'next if ($ARGV =~ /[.]c\z/ and not /\AThe/ and not /\bFAIL2?\b/) and /[;{)\,]\s*\z/; s/(?\<!X\<)(?\<!\$)(?\<![a-z])([Ll]ook)((?​:[Bb]ehind|[Aa]head|[Aa]round)s?)\b/$1 . "-" . (($1 eq ucfirst $1 and m{ A-Z}g > 1) ? ucfirst $2 : $2)/ge' `ack -li look | grep -v cpan/`

They pass 'make test'.

p5pRT commented 8 years ago

From @epa

look-ahead.patch ```diff diff --git a/parser.h b/parser.h index 96ab4f5..cca601c 100644 --- a/parser.h +++ b/parser.h @@ -36,8 +36,8 @@ typedef struct yy_parser { /* parser state */ struct yy_parser *old_parser; /* previous value of PL_parser */ - YYSTYPE yylval; /* value of lookahead symbol, set by yylex() */ - int yychar; /* The lookahead symbol. */ + YYSTYPE yylval; /* value of look-ahead symbol, set by yylex() */ + int yychar; /* The look-ahead symbol. */ /* Number of tokens to shift before error messages enabled. */ int yyerrstatus; diff --git a/perly.c b/perly.c index 91b4c79..58e2519 100644 --- a/perly.c +++ b/perly.c @@ -240,7 +240,7 @@ Perl_yyparse (pTHX_ int gramtype) int yyn; int yyresult; - /* Lookahead token as an internal (translated) token number. */ + /* Look-ahead token as an internal (translated) token number. */ int yytoken = 0; yy_parser *parser; /* the parser object */ @@ -306,17 +306,17 @@ Perl_yyparse (pTHX_ int gramtype) } /* Do appropriate processing given the current state. */ -/* Read a lookahead token if we need one and don't already have one. */ +/* Read a look-ahead token if we need one and don't already have one. */ - /* First try to decide what to do without reference to lookahead token. */ + /* First try to decide what to do without reference to look-ahead token. */ yyn = yypact[yystate]; if (yyn == YYPACT_NINF) goto yydefault; - /* Not known => get a lookahead token if don't already have one. */ + /* Not known => get a look-ahead token if don't already have one. */ - /* YYCHAR is either YYEMPTY or YYEOF or a valid lookahead symbol. */ + /* YYCHAR is either YYEMPTY or YYEOF or a valid look-ahead symbol. */ if (parser->yychar == YYEMPTY) { YYDPRINTF ((Perl_debug_log, "Reading a token:\n")); parser->yychar = yylex(); @@ -356,7 +356,7 @@ Perl_yyparse (pTHX_ int gramtype) if (yyn == YYFINAL) YYACCEPT; - /* Shift the lookahead token. */ + /* Shift the look-ahead token. */ YYDPRINTF ((Perl_debug_log, "Shifting token %s, ", yytname[yytoken])); /* Discard the token being shifted unless it is eof. */ @@ -460,7 +460,7 @@ Perl_yyparse (pTHX_ int gramtype) if (parser->yyerrstatus == 3) { - /* If just tried and failed to reuse lookahead token after an + /* If just tried and failed to reuse look-ahead token after an error, discard it. */ /* Return failure if at end of input. */ @@ -493,7 +493,7 @@ Perl_yyparse (pTHX_ int gramtype) } - /* Else will try to reuse lookahead token after shifting the error + /* Else will try to reuse look-ahead token after shifting the error token. */ goto yyerrlab1; diff --git a/pod/perl5100delta.pod b/pod/perl5100delta.pod index 10d71d6..c322ca5 100644 --- a/pod/perl5100delta.pod +++ b/pod/perl5100delta.pod @@ -180,7 +180,7 @@ that contain backreferences. See L. (Yves Orton) The functionality of Jeff Pinyan's module Regexp::Keep has been added to the core. In regular expressions you can now use the special escape C<\K> -as a way to do something like floating length positive lookbehind. It is +as a way to do something like floating length positive look-behind. It is also useful in substitutions like: s/(foo)bar/$1/g diff --git a/pod/perl5140delta.pod b/pod/perl5140delta.pod index 26df41c..52d1af1 100644 --- a/pod/perl5140delta.pod +++ b/pod/perl5140delta.pod @@ -3500,7 +3500,7 @@ The trie optimisation was not taking empty groups into account, preventing =item * -A pattern containing a C<+> inside a lookahead would sometimes cause an +A pattern containing a C<+> inside a look-ahead would sometimes cause an incorrect match failure in a global match (for example, C) [perl #68564]. diff --git a/pod/perl5200delta.pod b/pod/perl5200delta.pod index 874d8d1..4f5b4c3 100644 --- a/pod/perl5200delta.pod +++ b/pod/perl5200delta.pod @@ -2736,9 +2736,9 @@ don't depend on the locale. [perl #120675] =item * -Under certain conditions, Perl would throw an error if in an lookbehind +Under certain conditions, Perl would throw an error if in an look-behind assertion in a regexp, the assertion referred to a named subpattern, -complaining the lookbehind was variable when it wasn't. This has been +complaining the look-behind was variable when it wasn't. This has been fixed. [perl #120600], [perl #120618]. The current fix may be improved on in the future. diff --git a/pod/perl5201delta.pod b/pod/perl5201delta.pod index 9352801..3b617f3 100644 --- a/pod/perl5201delta.pod +++ b/pod/perl5201delta.pod @@ -242,7 +242,7 @@ diagnostic messages, see L. =item * -L%sE|perldiag/"Variable length lookbehind not implemented in regex m/%s/"> +L%sE|perldiag/"Variable length look-behind not implemented in regex m/%s/"> Information about Unicode behaviour has been added. diff --git a/pod/perl5220delta.pod b/pod/perl5220delta.pod index 52df04b..d893f7c 100644 --- a/pod/perl5220delta.pod +++ b/pod/perl5220delta.pod @@ -2268,7 +2268,7 @@ when it is actually a lexical sub that will not stay shared. =item * -L%sE|perldiag/"Variable length lookbehind not implemented in regex m/%s/"> +L%sE|perldiag/"Variable length look-behind not implemented in regex m/%s/"> The L entry for this warning has had information about Unicode behavior added. diff --git a/pod/perl58delta.pod b/pod/perl58delta.pod index 8b81d4c..7345233 100644 --- a/pod/perl58delta.pod +++ b/pod/perl58delta.pod @@ -2960,7 +2960,7 @@ otherwise. =item * -Variable length lookbehind has not yet been implemented, trying to +Variable length look-behind has not yet been implemented, trying to use it will tell that. =item * diff --git a/pod/perldiag.pod b/pod/perldiag.pod index 5111410..8f0f92f 100644 --- a/pod/perldiag.pod +++ b/pod/perldiag.pod @@ -3132,9 +3132,9 @@ not-a-number value). than it can reliably handle and C probably returned the wrong date. -=item Lookbehind longer than %d not implemented in regex m/%s/ +=item Look-behind longer than %d not implemented in regex m/%s/ -(F) There is currently a limit on the length of string which lookbehind can +(F) There is currently a limit on the length of string which look-behind can handle. This restriction may be eased in a future release. =item Lost precision when %s %f by 1 @@ -6923,10 +6923,10 @@ something else of the same name (usually a subroutine) is exported by that module. It usually means you put the wrong funny character on the front of your variable. -=item Variable length lookbehind not implemented in regex m/%s/ +=item Variable length look-behind not implemented in regex m/%s/ -(F) Lookbehind is allowed only for subexpressions whose length is fixed and -known at compile time. For positive lookbehind, you can use the C<\K> +(F) Look-behind is allowed only for subexpressions whose length is fixed and +known at compile time. For positive look-behind, you can use the C<\K> regex construct as a way to get the equivalent functionality. See L. diff --git a/pod/perlintro.pod b/pod/perlintro.pod index 9559cb1..a821f85 100644 --- a/pod/perlintro.pod +++ b/pod/perlintro.pod @@ -617,7 +617,7 @@ The results end up in C<$1>, C<$2> and so on. =item Other regexp features -Perl regexps also support backreferences, lookaheads, and all kinds of +Perl regexps also support backreferences, look-aheads, and all kinds of other complex details. Read all about them in L, L, and L. diff --git a/pod/perlreapi.pod b/pod/perlreapi.pod index c11ff9e..5702f26 100644 --- a/pod/perlreapi.pod +++ b/pod/perlreapi.pod @@ -228,7 +228,7 @@ faster than C. Added in perl 5.18.0, this flag indicates that a regular expression might perform an operation that would interfere with inplace substitution. For -instance it might contain lookbehind, or assign to non-magical variables +instance it might contain look-behind, or assign to non-magical variables (such as $REGMARK and $REGERROR) during matching. C will skip certain optimisations when this is set. diff --git a/pod/perlreref.pod b/pod/perlreref.pod index e9b784e..2e7cbcf 100644 --- a/pod/perlreref.pod +++ b/pod/perlreref.pod @@ -233,10 +233,10 @@ There is no quantifier C<{,n}>. That's interpreted as a literal string. (?#text) A comment (?:...) Groups subexpressions without capturing (cluster) (?pimsx-imsx:...) Enable/disable option (as per m// modifiers) - (?=...) Zero-width positive lookahead assertion - (?!...) Zero-width negative lookahead assertion - (?<=...) Zero-width positive lookbehind assertion - (?...) Grab what we can, prohibit backtracking (?|...) Branch reset (?...) Named capture diff --git a/pod/perlretut.pod b/pod/perlretut.pod index 9a3c696..9c8be29 100644 --- a/pod/perlretut.pod +++ b/pod/perlretut.pod @@ -2236,7 +2236,7 @@ case insensitively and turns off multi-line mode. =head2 Looking ahead and looking behind -This section concerns the lookahead and lookbehind assertions. First, +This section concerns the look-ahead and look-behind assertions. First, a little background. In Perl regular expressions, most regexp elements 'eat up' a certain @@ -2264,10 +2264,10 @@ characters before. C<$> looks ahead, to see that there are no characters after. C<\b> looks both ahead and behind, to see if the characters on either side differ in their "word-ness". -The lookahead and lookbehind assertions are generalizations of the -anchor concept. Lookahead and lookbehind are zero-width assertions +The look-ahead and look-behind assertions are generalizations of the +anchor concept. Look-ahead and look-behind are zero-width assertions that let us specify which characters we want to test for. The -lookahead assertion is denoted by C<(?=regexp)> and the lookbehind +look-ahead assertion is denoted by C<(?=regexp)> and the look-behind assertion is denoted by C<< (?<=fixed-regexp) >>. Some examples are $x = "I catch the housecat 'Tom-cat' with catnip"; @@ -2282,11 +2282,11 @@ assertion is denoted by C<< (?<=fixed-regexp) >>. Some examples are Note that the parentheses in C<(?=regexp)> and C<< (?<=regexp) >> are non-capturing, since these are zero-width assertions. Thus in the second regexp, the substrings captured are those of the whole regexp -itself. Lookahead C<(?=regexp)> can match arbitrary regexps, but -lookbehind C<< (?<=fixed-regexp) >> only works for regexps of fixed +itself. Look-ahead C<(?=regexp)> can match arbitrary regexps, but +look-behind C<< (?<=fixed-regexp) >> only works for regexps of fixed width, i.e., a fixed number of characters long. Thus C<< (?<=(ab|bc)) >> is fine, but C<< (?<=(ab)*) >> is not. The -negated versions of the lookahead and lookbehind assertions are +negated versions of the look-ahead and look-behind assertions are denoted by C<(?!regexp)> and C<< (?> respectively. They evaluate true if the regexps do I match: @@ -2394,7 +2394,7 @@ integer in parentheses C<(integer)>. It is true if the corresponding backreference C<\integer> matched earlier in the regexp. The same thing can be done with a name associated with a capture group, written as C<< () >> or C<< ('name') >>. The second form is a bare -zero-width assertion C<(?...)>, either a lookahead, a lookbehind, or a +zero-width assertion C<(?...)>, either a look-ahead, a look-behind, or a code assertion (discussed in the next section). The third set of forms provides tests that return true if the expression is executed within a recursion (C<(R)>) or is being called from some capturing group, @@ -2415,7 +2415,7 @@ regexp. This searches for words of the form C<"$x$x"> or C<"$x$y$y$x">: toto tutu -The lookbehind C allows, along with backreferences, +The look-behind C allows, along with backreferences, an earlier part of the match to influence a later part of the match. For instance, @@ -2424,7 +2424,7 @@ match. For instance, matches a DNA sequence such that it either ends in C, or some other base pair combination and C. Note that the form is C<< (?(?<=AA)G|C) >> and not C<< (?((?<=AA))G|C) >>; for the -lookahead, lookbehind or code assertions, the parentheses around the +look-ahead, look-behind or code assertions, the parentheses around the conditional are not needed. diff --git a/pod/perlunicode.pod b/pod/perlunicode.pod index a407faf..96c817e 100644 --- a/pod/perlunicode.pod +++ b/pod/perlunicode.pod @@ -1112,7 +1112,7 @@ feature, you can use one of the following: Regular expression look-ahead -You can mimic class subtraction using lookahead. +You can mimic class subtraction using look-ahead. For example, what UTS#18 might write as [{Block=Greek}-[{UNASSIGNED}]] diff --git a/regcomp.c b/regcomp.c index a37dc82..59becbc 100644 --- a/regcomp.c +++ b/regcomp.c @@ -407,13 +407,13 @@ struct RExC_state_t { - minlenp A pointer to the minimum number of characters of the pattern that the string was found inside. This is important as in the case of positive - lookahead or positive lookbehind we can have multiple patterns + look-ahead or positive look-behind we can have multiple patterns involved. Consider /(?=FOO).*F/ The minimum length of the pattern overall is 3, the minimum length - of the lookahead part is 3, but the minimum length of the part that + of the look-ahead part is 3, but the minimum length of the part that will actually match is 1. So 'FOO's minimum length is 3, but the minimum length for the F is 1. This is important as the minimum length is used to determine offsets in front of and behind the string being @@ -423,9 +423,9 @@ struct RExC_state_t { are not known until the full pattern has been compiled, thus the pointer to the value. - - lookbehind + - look-behind - In the case of lookbehind the string being searched for can be + In the case of look-behind the string being searched for can be offset past the start point of the final matching string. If this value was just blithely removed from the min_offset it would invalidate some of the calculations for how many chars must match @@ -433,7 +433,7 @@ struct RExC_state_t { the length of the string being searched for). When the final pattern is compiled and the data is moved from the scan_data_t structure into the regexp structure the information - about lookbehind is factored in, with the information that would + about look-behind is factored in, with the information that would have been lost precalculated in the end_shift field for the associated string. @@ -5266,14 +5266,14 @@ PerlIO_printf(Perl_debug_log, "LHS=%"UVuf" RHS=%"UVuf"\n", } else if ( PL_regkind[OP(scan)] == BRANCHJ - /* Lookbehind, or need to calculate parens/evals/stclass: */ + /* Look-behind, or need to calculate parens/evals/stclass: */ && (scan->flags || data || (flags & SCF_DO_STCLASS)) && (OP(scan) == IFMATCH || OP(scan) == UNLESSM)) { if ( !PERL_ENABLE_POSITIVE_ASSERTION_STUDY || OP(scan) == UNLESSM ) { - /* Negative Lookahead/lookbehind + /* Negative Look-ahead/look-behind In this case we can't do fixed string optimisation. */ @@ -5291,7 +5291,7 @@ PerlIO_printf(Perl_debug_log, "LHS=%"UVuf" RHS=%"UVuf"\n", data_fake.last_closep = &fake; data_fake.pos_delta = delta; if ( flags & SCF_DO_STCLASS && !scan->flags - && OP(scan) == IFMATCH ) { /* Lookahead */ + && OP(scan) == IFMATCH ) { /* Look-ahead */ ssc_init(pRExC_state, &intrnl); data_fake.start_class = &intrnl; f |= SCF_DO_STCLASS_AND; @@ -5305,10 +5305,10 @@ PerlIO_printf(Perl_debug_log, "LHS=%"UVuf" RHS=%"UVuf"\n", recursed_depth, NULL, f, depth+1); if (scan->flags) { if (deltanext) { - FAIL("Variable length lookbehind not implemented"); + FAIL("Variable length look-behind not implemented"); } else if (minnext > (I32)U8_MAX) { - FAIL2("Lookbehind longer than %"UVuf" not implemented", + FAIL2("Look-behind longer than %"UVuf" not implemented", (UV)U8_MAX); } scan->flags = (U8)minnext; @@ -5341,10 +5341,10 @@ PerlIO_printf(Perl_debug_log, "LHS=%"UVuf" RHS=%"UVuf"\n", } #if PERL_ENABLE_POSITIVE_ASSERTION_STUDY else { - /* Positive Lookahead/lookbehind + /* Positive Look-ahead/look-behind In this case we can do fixed string optimisation, but we must be careful about it. Note in the case of - lookbehind the positions will be offset by the minimum + look-behind the positions will be offset by the minimum length of the pattern, something we won't know about until after the recurse. */ @@ -5378,7 +5378,7 @@ PerlIO_printf(Perl_debug_log, "LHS=%"UVuf" RHS=%"UVuf"\n", if (is_inf) data_fake.flags |= SF_IS_INF; if ( flags & SCF_DO_STCLASS && !scan->flags - && OP(scan) == IFMATCH ) { /* Lookahead */ + && OP(scan) == IFMATCH ) { /* Look-ahead */ ssc_init(pRExC_state, &intrnl); data_fake.start_class = &intrnl; f |= SCF_DO_STCLASS_AND; @@ -5394,10 +5394,10 @@ PerlIO_printf(Perl_debug_log, "LHS=%"UVuf" RHS=%"UVuf"\n", f,depth+1); if (scan->flags) { if (deltanext) { - FAIL("Variable length lookbehind not implemented"); + FAIL("Variable length look-behind not implemented"); } else if (*minnextp > (I32)U8_MAX) { - FAIL2("Lookbehind longer than %"UVuf" not implemented", + FAIL2("Look-behind longer than %"UVuf" not implemented", (UV)U8_MAX); } scan->flags = (U8)*minnextp; @@ -6380,7 +6380,7 @@ S_setup_longest(pTHX_ RExC_state_t *pRExC_state, SV* sv_longest, } /* end_shift is how many chars that must be matched that follow this item. We calculate it ahead of time as once the - lookbehind offset is added in we lose the ability to correctly + look-behind offset is added in we lose the ability to correctly calculate it.*/ ml = minlen ? *(minlen) : (SSize_t)longest_length; *rx_end_shift = ml - offset @@ -7023,14 +7023,14 @@ Perl_re_op_compile(pTHX_ SV ** const patternp, int pat_count, * NOTE that EXACT is NOT covered here, as it is normally * picked up by the optimiser separately. * - * This is unfortunate as the optimiser isnt handling lookahead + * This is unfortunate as the optimiser isnt handling look-ahead * properly currently. * */ while ((OP(first) == OPEN && (sawopen = 1)) || /* An OR of *one* alternative - should not happen now. */ (OP(first) == BRANCH && OP(first_next) != BRANCH) || - /* for now we can't handle lookbehind IFMATCH*/ + /* for now we can't handle look-behind IFMATCH*/ (OP(first) == IFMATCH && !first->flags && (sawlookahead = 1)) || (OP(first) == PLUS) || (OP(first) == MINMOD) || @@ -7358,7 +7358,7 @@ Perl_re_op_compile(pTHX_ SV ** const patternp, int pat_count, r->intflags |= PREGf_GPOS_SEEN; if (RExC_seen & REG_LOOKBEHIND_SEEN) r->extflags |= RXf_NO_INPLACE_SUBST; /* inplace might break the - lookbehind */ + look-behind */ if (pRExC_state->num_code_blocks) r->extflags |= RXf_EVAL_SEEN; if (RExC_seen & REG_VERBARG_SEEN) @@ -10335,7 +10335,7 @@ S_reg(pTHX_ RExC_state_t *pRExC_state, I32 paren, I32 *flagp,U32 depth) RExC_parse[1] == '!' || RExC_parse[1] == '<' || RExC_parse[1] == '{' - ) { /* Lookahead or eval. */ + ) { /* Look-ahead or eval. */ I32 flag; regnode *tail; diff --git a/regexec.c b/regexec.c index 85c31a6..9b905ab 100644 --- a/regexec.c +++ b/regexec.c @@ -240,8 +240,8 @@ static const char* const non_utf8_target_but_utf8_required #endif /* - Search for mandatory following text node; for lookahead, the text must - follow but for lookbehind (rn->flags != 0) we skip to the next step. + Search for mandatory following text node; for look-ahead, the text must + follow but for look-behind (rn->flags != 0) we skip to the next step. */ #define FIND_NEXT_IMPT(rn) STMT_START { \ while (JUMPABLE(rn)) { \ @@ -669,7 +669,7 @@ Perl_re_intuit_start(pTHX_ /* for now, assume that if both present, that the floating substring * doesn't start before the anchored substring. * If you break this assumption (e.g. doing better optimisations - * with lookahead/behind), then you'll need to audit the code in this + * with look-ahead/behind), then you'll need to audit the code in this * function carefully first */ assert( @@ -1263,7 +1263,7 @@ Perl_re_intuit_start(pTHX_ * Since minlen is already taken into account, rx_origin+1 is * before strend; accidentally, minlen >= 1 guaranties no false * positives at rx_origin + 1 even for \b or \B. But (minlen? 1 : - * 0) below assumes that regstclass does not come from lookahead... + * 0) below assumes that regstclass does not come from look-ahead... * If regstclass takes bytelength more than 1: If charlength==1, OK. * This leaves EXACTF-ish only, which are dealt with in * find_byclass(). @@ -3602,7 +3602,7 @@ states to pop, we return failure. Sometimes we also need to backtrack on success; for example /A+/, where after successfully matching one A, we need to go back and try to -match another one; similarly for lookahead assertions: if the assertion +match another one; similarly for look-ahead assertions: if the assertion completes successfully, we backtrack to the state just before the assertion and then carry on. In these cases, the pushed state is marked as 'backtrack on success too'. This marking is in fact done by a chain of @@ -3627,7 +3627,7 @@ rest of the pattern. Variable and state names reflect this convention. The states in the main switch are the union of ops and failure/success of substates associated with with that op. For example, IFMATCH is the op -that does lookahead assertions /(?=A)B/ and so the IFMATCH state means +that does look-ahead assertions /(?=A)B/ and so the IFMATCH state means 'execute IFMATCH'; while IFMATCH_A is a state saying that we have just successfully matched A and IFMATCH_A_fail is a state saying that we have just failed to match A. Resume states always come in pairs. The backtrack @@ -3708,7 +3708,7 @@ end of the pattern, rather than at X in the following: /(((X)+)+)+....(Y)+....Z/ -The only exceptions to this are lookahead/behind assertions and the cut, +The only exceptions to this are look-ahead/behind assertions and the cut, (?>A), which pop all the backtrack states associated with A before continuing. @@ -7322,7 +7322,7 @@ NULL scan = NEXTOPER(scan) + NODE_STEP_REGNODE; repeat: /* - * Lookahead to avoid useless match attempts + * Look-ahead to avoid useless match attempts * when we know what character comes next. * * Used to only do .*x and .*?x, but now it allows @@ -7652,11 +7652,11 @@ NULL newstart = locinput; goto do_ifmatch; - case UNLESSM: /* -ve lookaround: (?!A), or with flags, (?flags) { diff --git a/regexp.h b/regexp.h index 5dbab2e..8de59ec 100644 --- a/regexp.h +++ b/regexp.h @@ -738,7 +738,7 @@ typedef struct regmatch_state { } trie; /* special types - these members are used to store state for special - regops like eval, if/then, lookaround and the markpoint state */ + regops like eval, if/then, look-around and the markpoint state */ struct { /* this first element must match u.yes */ struct regmatch_state *prev_yes_state; diff --git a/sv.c b/sv.c index d23cd75..4f8543d 100644 --- a/sv.c +++ b/sv.c @@ -8543,7 +8543,7 @@ Perl_sv_gets(pTHX_ SV *const sv, PerlIO *const fp, I32 append) PTR2UV(PerlIO_has_base (fp) ? PerlIO_get_base(fp) : 0))); /* - call PerlIO_getc() to let it prefill the lookahead buffer + call PerlIO_getc() to let it prefill the look-ahead buffer This used to call 'filbuf' in stdio form, but as that behaves like getc when cnt <= 0 we use PerlIO_getc here to avoid introducing diff --git a/t/re/pat.t b/t/re/pat.t index fb4caf6..3b3fcde 100644 --- a/t/re/pat.t +++ b/t/re/pat.t @@ -293,7 +293,7 @@ sub run_tests { { $_ = 'foobar1 bar2 foobar3 barfoobar5 foobar6'; my @out = /(?a+)b) aaab y $1 aaab (?>(a+))b aaab y $1 aaa ((?>[^()]+)|\([^()]*\))+ ((abc(ade)ufh()()x y $& abc(ade)ufh()()x -(?<=x+)y - c - Variable length lookbehind not implemented +(?<=x+)y - c - Variable length look-behind not implemented ((def){37,17})?ABC ABC y $& ABC \Z a\nb\n y $-[0] 3 \z a\nb\n y $-[0] 4 @@ -1047,7 +1047,7 @@ X(A|B||C|D)Y XXXYYY y $& XY # Trie w/ NOTHING (?i:X([A]|[B]|y[Y]y|[D]|)Y) XXXYYYB y $& XY # Trie w/ NOTHING ^([a]{1})*$ aa y $1 a a(?!b(?!c))(..) abababc y $1 bc # test nested negatives -a(?!b(?=a))(..) abababc y $1 bc # test nested lookaheads +a(?!b(?=a))(..) abababc y $1 bc # test nested look-aheads a(?!b(?!c(?!d(?!e))))...(.) abxabcdxabcde y $1 e X(?!b+(?!(c+)*(?!(c+)*d))).*X aXbbbbbbbcccccccccccccaaaX y - - ^(XXXXXXXXXX|YYYYYYYYYY|Z.Q*X|Z[TE]Q*P): ZEQQQQQQQQQQQQQQQQQQP: y $1 ZEQQQQQQQQQQQQQQQQQQP @@ -1332,7 +1332,7 @@ a*(*F) aaaab n - - /(?Pfoo) (?P=n)/ ..foo foo.. yM $+{n} foo miniperl cannot load Tie::Hash::NamedCapture /(?Pas) (\w+) (?P=as) (\w+)/ as easy as pie y $1-$2-$3 as-easy-pie -#check that non identifiers as names are treated as the appropriate lookaround +#check that non identifiers as names are treated as the appropriate look-around (?<=bar>)foo bar>foo y $& foo (?)foo bar>foo n - - (?<=bar>ABC)foo bar>ABCfoo y $& foo @@ -1431,7 +1431,7 @@ foo(\h)bar foo\tbar y $1 \t /^\s*i.*?o\s*$/s io\n io y - - # As reported in #59168 by Father Chrysostomos: /(.*?)a(?!(a+)b\2c)/ baaabaac y $&-$1 baa-ba -# [perl #60344] Regex lookbehind failure after an (if)then|else in perl 5.10 +# [perl #60344] Regex look-behind failure after an (if)then|else in perl 5.10 /\A(?(?=db2)db2|\D+)(?a)(?(?=(?&W))(?<=(?&W)))(?&BB) aa y $& a # test repeated recursive patterns # This group is from RT #121144 diff --git a/t/re/reg_mesg.t b/t/re/reg_mesg.t index 9e5a406..4bf9bc8 100644 --- a/t/re/reg_mesg.t +++ b/t/re/reg_mesg.t @@ -100,9 +100,9 @@ my @death = ( '/[[=foo=]]/' => 'POSIX syntax [= =] is reserved for future extensions {#} m/[[=foo=]{#}]/', - '/(?<= .*)/' => 'Variable length lookbehind not implemented in regex m/(?<= .*)/', + '/(?<= .*)/' => 'Variable length look-behind not implemented in regex m/(?<= .*)/', - '/(?<= x{1000})/' => 'Lookbehind longer than 255 not implemented in regex m/(?<= x{1000})/', + '/(?<= x{1000})/' => 'Look-behind longer than 255 not implemented in regex m/(?<= x{1000})/', '/(?@)/' => 'Sequence (?@...) not implemented {#} m/(?@{#})/', @@ -350,9 +350,9 @@ my @death_only_under_strict = ( # These need the character '���' as a marker for mark_as_utf8() my @death_utf8 = mark_as_utf8( '/���[[=���=]]���/' => 'POSIX syntax [= =] is reserved for future extensions {#} m/���[[=���=]{#}]���/', - '/���(?<= .*)/' => 'Variable length lookbehind not implemented in regex m/���(?<= .*)/', + '/���(?<= .*)/' => 'Variable length look-behind not implemented in regex m/���(?<= .*)/', - '/(?<= ���{1000})/' => 'Lookbehind longer than 255 not implemented in regex m/(?<= ���{1000})/', + '/(?<= ���{1000})/' => 'Look-behind longer than 255 not implemented in regex m/(?<= ���{1000})/', '/���(?���)���/' => 'Sequence (?���...) not recognized {#} m/���(?���{#})���/', diff --git a/t/re/subst.t b/t/re/subst.t index 2fed182..bbae0d3 100644 --- a/t/re/subst.t +++ b/t/re/subst.t @@ -312,7 +312,7 @@ $_ = "abcd"; s/(..)/$x = $1, m#.#/eg; ok( $x eq "cd", 'a match nested in the RHS of a substitution' ); -# Subst and lookbehind +# Subst and look-behind $_="ccccc"; $snum = s/(? 'foo', {}, fresh_perl_is( '$_="abcdef"; s/bc|(.)\G(.)/$1 ? "[$1-$2]" : "XX"/ge; print' => 'aXXdef', {}, 'positive GPOS regex substitution failure (#69056, #114884)' ); fresh_perl_is( '$_="abcdefg123456"; s/(?<=...\G)?(\d)/($1)/; print' => 'abcdefg(1)23456', {}, - 'positive GPOS lookbehind regex substitution failure #114884' ); + 'positive GPOS look-behind regex substitution failure #114884' ); # s/..\G//g should stop after the first iteration, rather than working its # way backwards, or looping infinitely, or SEGVing (for example) @@ -723,39 +723,39 @@ fresh_perl_is( '$_="abcdefg123456"; s/(?<=...\G)?(\d)/($1)/; print' => 'abcdefg( $s = '123456'; pos($s) = 4; $count = $s =~ s/\d\d(?=\d\G)/7/g; - is($count, 1, "..\\G count (lookahead short)"); - is($s, "17456", "..\\G s (lookahead short)"); + is($count, 1, "..\\G count (look-ahead short)"); + is($s, "17456", "..\\G s (look-ahead short)"); $s = '123456'; pos($s) = 4; $count = $s =~ s/\d\d(?=\d\G)/78/g; - is($count, 1, "..\\G count (lookahead equal)"); - is($s, "178456", "..\\G s (lookahead equal)"); + is($count, 1, "..\\G count (look-ahead equal)"); + is($s, "178456", "..\\G s (look-ahead equal)"); $s = '123456'; pos($s) = 4; $count = $s =~ s/\d\d(?=\d\G)/789/g; - is($count, 1, "..\\G count (lookahead long)"); - is($s, "1789456", "..\\G s (lookahead long)"); + is($count, 1, "..\\G count (look-ahead long)"); + is($s, "1789456", "..\\G s (look-ahead long)"); $s = '123456'; pos($s) = 4; $count = $s =~ s/\d\d(?=\d\G)/$f->(1)/eg; - is($count, 1, "..\\G count (lookahead short code)"); - is($s, "17456", "..\\G s (lookahead short code)"); + is($count, 1, "..\\G count (look-ahead short code)"); + is($s, "17456", "..\\G s (look-ahead short code)"); $s = '123456'; pos($s) = 4; $count = $s =~ s/\d\d(?=\d\G)/$f->(2)/eg; - is($count, 1, "..\\G count (lookahead equal code)"); - is($s, "178456", "..\\G s (lookahead equal code)"); + is($count, 1, "..\\G count (look-ahead equal code)"); + is($s, "178456", "..\\G s (look-ahead equal code)"); $s = '123456'; pos($s) = 4; $count = $s =~ s/\d\d(?=\d\G)/$f->(3)/eg; - is($count, 1, "..\\G count (lookahead long code)"); - is($s, "1789456", "..\\G s (lookahead long code)"); + is($count, 1, "..\\G count (look-ahead long code)"); + is($s, "1789456", "..\\G s (look-ahead long code)"); } diff --git a/toke.c b/toke.c index 2c0a3c9..3c531f6 100644 --- a/toke.c +++ b/toke.c @@ -1996,7 +1996,7 @@ S_newSV_maybe_utf8(pTHX_ const char *const start, STRLEN len) * When the lexer knows the next thing is a word (for instance, it has * just seen -> and it knows that the next char is a word char, then * it calls S_force_word to stick the next word into the PL_nexttoke/val - * lookahead. + * look-ahead. * * Arguments: * char *start : buffer position (must be within PL_linestr) ```
p5pRT commented 8 years ago

From @epa

lookahead.patch ```diff diff --git a/lib/unicore/mktables b/lib/unicore/mktables index 5711791..8989986 100644 --- a/lib/unicore/mktables +++ b/lib/unicore/mktables @@ -3100,7 +3100,7 @@ END # Not currently used, not fully tested. # sub peek { -# # Non-destructive look-ahead one non-adjusted, non-comment, non-blank +# # Non-destructive lookahead one non-adjusted, non-comment, non-blank # # record. Not callable from an each_line_handler(), nor does it call # # an each_line_handler() on the line. # diff --git a/pod/perlre.pod b/pod/perlre.pod index e45e444..08c98eb 100644 --- a/pod/perlre.pod +++ b/pod/perlre.pod @@ -1242,48 +1242,48 @@ Not doing so may lead to surprises: The problem here is that both the group named C<< a >> and the group named C<< b >> are aliases for the group belonging to C<< $1 >>. -=item Look-Around Assertions +=item Lookaround Assertions X X X X -Look-around assertions are zero-width patterns which match a specific +Lookaround assertions are zero-width patterns which match a specific pattern without including it in C<$&>. Positive assertions match when their subpattern matches, negative assertions match when their subpattern -fails. Look-behind matches text up to the current match position, -look-ahead matches text following the current match position. +fails. Lookbehind matches text up to the current match position, +lookahead matches text following the current match position. =over 4 =item C<(?=pattern)> X<(?=)> X X -A zero-width positive look-ahead assertion. For example, C +A zero-width positive lookahead assertion. For example, C matches a word followed by a tab, without including the tab in C<$&>. =item C<(?!pattern)> X<(?!)> X X -A zero-width negative look-ahead assertion. For example C +A zero-width negative lookahead assertion. For example C matches any occurrence of "foo" that isn't followed by "bar". Note -however that look-ahead and look-behind are NOT the same thing. You cannot -use this for look-behind. +however that lookahead and lookbehind are NOT the same thing. You cannot +use this for lookbehind. If you are looking for a "bar" that isn't preceded by a "foo", C will not do what you want. That's because the C<(?!foo)> is just saying that the next thing cannot be "foo"--and it's not, it's a "bar", so "foobar" will -match. Use look-behind instead (see below). +match. Use lookbehind instead (see below). =item C<(?<=pattern)> C<\K> X<(?<=)> X X X<\K> -A zero-width positive look-behind assertion. For example, C +A zero-width positive lookbehind assertion. For example, C matches a word that follows a tab, without including the tab in C<$&>. -Works only for fixed-width look-behind. +Works only for fixed-width lookbehind. There is a special form of this construct, called C<\K> (available since Perl 5.10.0), which causes the regex engine to "keep" everything it had matched prior to the C<\K> and not include it in C<$&>. This effectively provides variable-length -look-behind. The use of C<\K> inside of another look-around assertion +lookbehind. The use of C<\K> inside of another lookaround assertion is allowed, but the behaviour is currently not well defined. For various reasons C<\K> may be significantly more efficient than the @@ -1300,9 +1300,9 @@ can be rewritten as the much more efficient =item C<(? X<(? X X -A zero-width negative look-behind assertion. For example C +A zero-width negative lookbehind assertion. For example C matches any occurrence of "foo" that does not follow "bar". Works -only for fixed-width look-behind. +only for fixed-width lookbehind. =back @@ -1653,7 +1653,7 @@ C<(condition)> should be one of: (which is valid if the corresponding pair of parentheses matched); -=item a look-ahead/look-behind/evaluate zero-width assertion; +=item a lookahead/lookbehind/evaluate zero-width assertion; =item a name in angle brackets or single quotes @@ -1839,7 +1839,7 @@ the C pragma or B<-w> switch saying it C<"matches null string many times in regex">. On simple groups, such as the pattern C<< (?> [^()]+ ) >>, a comparable -effect may be achieved by negative look-ahead, as in C<[^()]+ (?! [^()] )>. +effect may be achieved by negative lookahead, as in C<[^()]+ (?! [^()] )>. This was only 4 times slower on a string with 1000000 Cs. The "grab all you can, and do not give anything back" semantic is desirable @@ -2242,7 +2242,7 @@ definition might succeed against a particular string. And if there are multiple ways it might succeed, you need to understand backtracking to know which variety of success you will achieve. -When using look-ahead assertions and negations, this can all get even +When using lookahead assertions and negations, this can all get even trickier. Imagine you'd like to find a sequence of non-digits not followed by "123". You might try to write that as @@ -2292,7 +2292,7 @@ time. Now there's indeed something following "AB" that is not We can deal with this by using both an assertion and a negation. We'll say that the first part in C<$1> must be followed both by a digit -and by something that's not "123". Remember that the look-aheads +and by something that's not "123". Remember that the lookaheads are zero-width expressions--they only look, but don't consume any of the string in their match. So rewriting this way produces what you'd expect; that is, case 5 will fail, but case 6 succeeds: @@ -2329,10 +2329,10 @@ match takes a long time to finish. A powerful tool for optimizing such beasts is what is known as an "independent group", which does not backtrack (see Lpattern) >>>). Note also that -zero-length look-ahead/look-behind assertions will not backtrack to make +zero-length lookahead/lookbehind assertions will not backtrack to make the tail match, since they are in "logical" context: only whether they match is considered relevant. For an example -where side-effects of look-ahead I have influenced the +where side-effects of lookahead I have influenced the following match, see Lpattern) >>>. =head2 Version 8 Regular Expressions diff --git a/pod/perlreref.pod b/pod/perlreref.pod index e9b784e..db7c173 100644 --- a/pod/perlreref.pod +++ b/pod/perlreref.pod @@ -252,10 +252,10 @@ There is no quantifier C<{,n}>. That's interpreted as a literal string. (?P>name) Recurse into a named subpattern (python syntax) (?(cond)yes|no) (?(cond)yes) Conditional expression, where "cond" can be: - (?=pat) look-ahead - (?!pat) negative look-ahead - (?<=pat) look-behind - (?) named subpattern has matched something ('name') named subpattern has matched something diff --git a/pod/perlunicode.pod b/pod/perlunicode.pod index a407faf..2929163 100644 --- a/pod/perlunicode.pod +++ b/pod/perlunicode.pod @@ -1110,7 +1110,7 @@ feature, you can use one of the following: =item * -Regular expression look-ahead +Regular expression lookahead You can mimic class subtraction using lookahead. For example, what UTS#18 might write as @@ -1223,7 +1223,7 @@ Level 3 - Tailored Support [17] see UAX#10 "Unicode Collation Algorithms" [18] have Unicode::Collate but not integrated to regexes - [19] have (?<=x) and (?=x), but look-aheads or look-behinds + [19] have (?<=x) and (?=x), but lookaheads or lookbehinds should see outside of the target substring [20] need insensitive matching for linguistic features other than case; for example, hiragana to katakana, wide and diff --git a/regcomp.c b/regcomp.c index a37dc82..f64b148 100644 --- a/regcomp.c +++ b/regcomp.c @@ -9979,7 +9979,7 @@ S_reg(pTHX_ RExC_state_t *pRExC_state, I32 paren, I32 *flagp,U32 depth) RExC_parse++; paren = *RExC_parse++; - ret = NULL; /* For look-ahead/behind. */ + ret = NULL; /* For lookahead/behind. */ switch (paren) { case 'P': /* (?P...) variants for those used to PCRE/Python */ diff --git a/regexec.c b/regexec.c index 85c31a6..a49ce7b 100644 --- a/regexec.c +++ b/regexec.c @@ -654,7 +654,7 @@ Perl_re_intuit_start(pTHX_ "Intuit: trying to determine minimum start position...\n")); /* for now, assume that all substr offsets are positive. If at some point - * in the future someone wants to do clever things with look-behind and + * in the future someone wants to do clever things with lookbehind and * -ve offsets, they'll need to fix up any code in this function * which uses these offsets. See the thread beginning * <20140113145929.GF27210@iabyn.com> @@ -2683,7 +2683,7 @@ S_reg_set_capture_string(pTHX_ REGEXP * const rx, U32 n = 0; max = -1; /* calculate the right-most part of the string covered - * by a capture. Due to look-ahead, this may be to + * by a capture. Due to lookahead, this may be to * the right of $&, so we have to scan all captures */ while (n <= prog->lastparen) { if (prog->offs[n].end > max) @@ -2704,7 +2704,7 @@ S_reg_set_capture_string(pTHX_ REGEXP * const rx, U32 n = 0; min = max; /* calculate the left-most part of the string covered - * by a capture. Due to look-behind, this may be to + * by a capture. Due to lookbehind, this may be to * the left of $&, so we have to scan all captures */ while (min && n <= prog->lastparen) { if ( prog->offs[n].start != -1 ```
p5pRT commented 8 years ago

From @rjbs

* Ed Avis via RT \perlbug\-followup@&#8203;perl\.org [2015-11-18T11​:52​:30]

Here are two patches. One changes consistently to 'lookahead'\, and the other consistently to 'look-ahead'. The p5-porters can apply which of the two they prefer.

I plan to apply the "settle on lookahead" patch unless someone has a reasoned objection.

-- rjbs

p5pRT commented 8 years ago

From @jkeenan

On Sat Dec 05 18​:41​:22 2015\, perl.p5p@​rjbs.manxome.org wrote​:

* Ed Avis via RT \perlbug\-followup@&#8203;perl\.org [2015-11-18T11​:52​:30]

Here are two patches. One changes consistently to 'lookahead'\, and the other consistently to 'look-ahead'. The p5-porters can apply which of the two they prefer.

I plan to apply the "settle on lookahead" patch unless someone has a reasoned objection.

rjbs​: You applied a patch in commit f67a500207b5795952c02ea7b3c1af93098433fb. Is this ticket closable?

Thank you very much.

-- James E Keenan (jkeenan@​cpan.org)

p5pRT commented 8 years ago

From @rjbs

It is\, so I hereby close it! Thanks.

-- rjbs

p5pRT commented 8 years ago

@rjbs - Status changed from 'open' to 'resolved'

p5pRT commented 8 years ago

From @epa

Shouldn't this be 'pending release' rather than 'resolved'?

Can the patch be merged into the 5.22 maintenance tree?

p5pRT commented 8 years ago

From @demerphq

Just out of curiosity did you do the same thing look-behind and look-around?

Yves

On 9 December 2015 at 11​:48\, Ed Avis via RT \perlbug\-followup@&#8203;perl\.org wrote​:

Shouldn't this be 'pending release' rather than 'resolved'?

Can the patch be merged into the 5.22 maintenance tree?

--- via perlbug​: queue​: perl5 status​: resolved https://rt-archive.perl.org/perl5/Ticket/Display.html?id=126608

-- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 8 years ago

From @epa

Yes\, the patch does lookahead\, lookbehind\, and lookaround. See above for the perl -i command that was used to generate it.

p5pRT commented 8 years ago

From @demerphq

On 9 December 2015 at 11​:59\, Ed Avis via RT \perlbug\-followup@&#8203;perl\.org wrote​:

Yes\, the patch does lookahead\, lookbehind\, and lookaround. See above for the perl -i command that was used to generate it.

Oh sorry\, I missed that in the big patch. :-)

Thanks! Sorry for contributing to this problem. I am sure at least a few things you fixed were me.

Yves