Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.93k stars 552 forks source link

PATCH: reorder bits in op.h to make space for new regex modifiers #10495

Closed p5pRT closed 14 years ago

p5pRT commented 14 years ago

Migrated from rt.perl.org#76586 (status was 'resolved')

Searchable as RT76586$

p5pRT commented 14 years ago

From @khwilliamson

I'm hoping this is an uncontroversial patch\, in preparation for the new regex modifiers coming soon. This does cause binary incompatibility.

p5pRT commented 14 years ago

From @khwilliamson

0001-Add-to-end-of-continuation-lines-in-defines.patch ```diff From e47823873bff0b844484bb1ce8a4c780ef1c23df Mon Sep 17 00:00:00 2001 From: Karl Williamson Date: Sat, 26 Jun 2010 11:37:41 -0600 Subject: [PATCH] Add \ to end of continuation lines in #defines These were in comments, and apparently not harmful, but not good form. --- op.h | 12 ++++++------ 1 files changed, 6 insertions(+), 6 deletions(-) diff --git a/op.h b/op.h index 712039c..b05acd1 100644 --- a/op.h +++ b/op.h @@ -361,16 +361,16 @@ struct pmop { #define PMf_RETAINT 0x00000040 /* taint $1 etc. if target tainted */ -#define PMf_ONCE 0x00000080 /* match successfully only once per - reset, with related flag RXf_USED - in re->extflags holding state. - This is used only for ?? matches, +#define PMf_ONCE 0x00000080 /* match successfully only once per \ + reset, with related flag RXf_USED \ + in re->extflags holding state. \ + This is used only for ?? matches, \ and only on OP_MATCH and OP_QR */ #define PMf_UNUSED 0x00000100 /* free for use */ #define PMf_MAYBE_CONST 0x00000200 /* replacement contains variables */ -#define PMf_USED 0x00000400 /* PMf_ONCE has matched successfully. +#define PMf_USED 0x00000400 /* PMf_ONCE has matched successfully. \ Not used under threading. */ #define PMf_CONST 0x00000800 /* subst replacement is constant */ @@ -378,7 +378,7 @@ struct pmop { #define PMf_GLOBAL 0x00002000 /* pattern had a g modifier */ #define PMf_CONTINUE 0x00004000 /* don't reset pos() if //g fails */ #define PMf_EVAL 0x00008000 /* evaluating replacement as expr */ -#define PMf_NONDESTRUCT 0x00010000 /* Return substituted string instead +#define PMf_NONDESTRUCT 0x00010000 /* Return substituted string instead \ of modifying it. */ /* The following flags have exact equivalents in regcomp.h with the prefix RXf_ -- 1.5.6.3 ```
p5pRT commented 14 years ago

From @khwilliamson

0002-Reorder-bit-definitions-in-op.h-to-numerical-order.patch ```diff From a2a4cb587417a7e1e99787f3f3e9679a82815470 Mon Sep 17 00:00:00 2001 From: Karl Williamson Date: Sat, 26 Jun 2010 11:46:23 -0600 Subject: [PATCH] Reorder bit definitions in op.h to numerical order This patch just moves statements physically around in the file without changing any definitions, to allow easier eye-balling of what bits are used --- op.h | 24 +++++++++++++----------- 1 files changed, 13 insertions(+), 11 deletions(-) diff --git a/op.h b/op.h index b05acd1..79a6e01 100644 --- a/op.h +++ b/op.h @@ -360,6 +360,19 @@ struct pmop { #endif +/* The following flags have exact equivalents in regcomp.h with the prefix RXf_ + * which are stored in the regexp->extflags member. If you change them here, + * you have to change them there, and vice versa. + */ +#define PMf_MULTILINE 0x00000001 /* assume multiple lines */ +#define PMf_SINGLELINE 0x00000002 /* assume single line */ +#define PMf_FOLD 0x00000004 /* case insensitivity */ +#define PMf_EXTENDED 0x00000008 /* chuck embedded whitespace */ +#define PMf_KEEPCOPY 0x00000010 /* copy the string when matching */ +#define PMf_LOCALE 0x00000020 /* use locale for character types */ + +/* End of regexp.h equivalents. */ + #define PMf_RETAINT 0x00000040 /* taint $1 etc. if target tainted */ #define PMf_ONCE 0x00000080 /* match successfully only once per \ reset, with related flag RXf_USED \ @@ -381,17 +394,6 @@ struct pmop { #define PMf_NONDESTRUCT 0x00010000 /* Return substituted string instead \ of modifying it. */ -/* The following flags have exact equivalents in regcomp.h with the prefix RXf_ - * which are stored in the regexp->extflags member. If you change them here, - * you have to change them there, and vice versa. - */ -#define PMf_MULTILINE 0x00000001 /* assume multiple lines */ -#define PMf_SINGLELINE 0x00000002 /* assume single line */ -#define PMf_FOLD 0x00000004 /* case insensitivity */ -#define PMf_EXTENDED 0x00000008 /* chuck embedded whitespace */ -#define PMf_KEEPCOPY 0x00000010 /* copy the string when matching */ -#define PMf_LOCALE 0x00000020 /* use locale for character types */ - /* mask of bits that need to be transfered to re->extflags */ #define PMf_COMPILETIME (PMf_MULTILINE|PMf_SINGLELINE|PMf_LOCALE|PMf_FOLD|PMf_EXTENDED|PMf_KEEPCOPY) -- 1.5.6.3 ```
p5pRT commented 14 years ago

From @khwilliamson

0003-Correct-comment-in-op.h.patch ```diff From 40da7d99b055e4bdc86052ddfd39454f8b5d4fa5 Mon Sep 17 00:00:00 2001 From: Karl Williamson Date: Sat, 26 Jun 2010 12:34:29 -0600 Subject: [PATCH] Correct comment in op.h --- op.h | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/op.h b/op.h index 79a6e01..77ef6c5 100644 --- a/op.h +++ b/op.h @@ -360,7 +360,7 @@ struct pmop { #endif -/* The following flags have exact equivalents in regcomp.h with the prefix RXf_ +/* The following flags have exact equivalents in regexp.h with the prefix RXf_ * which are stored in the regexp->extflags member. If you change them here, * you have to change them there, and vice versa. */ -- 1.5.6.3 ```
p5pRT commented 14 years ago

From @khwilliamson

0004-Move-PMf-bits-around-in-op.h.patch ```diff From b53dec34ed587cda028a995aae3b8134f8ad93a8 Mon Sep 17 00:00:00 2001 From: Karl Williamson Date: Sat, 26 Jun 2010 12:37:04 -0600 Subject: [PATCH] Move PMf bits around in op.h This patch is to make room for handling new regex modifiers. It just moves the bits that aren't shared with the RXf structure in regexp.h, now using a previously unused bit. This breaks binary compatibility with previously compiled perls. --- op.h | 6 ++---- 1 files changed, 2 insertions(+), 4 deletions(-) diff --git a/op.h b/op.h index 77ef6c5..333daea 100644 --- a/op.h +++ b/op.h @@ -373,14 +373,12 @@ struct pmop { /* End of regexp.h equivalents. */ -#define PMf_RETAINT 0x00000040 /* taint $1 etc. if target tainted */ -#define PMf_ONCE 0x00000080 /* match successfully only once per \ +#define PMf_RETAINT 0x00000080 /* taint $1 etc. if target tainted */ +#define PMf_ONCE 0x00000100 /* match successfully only once per \ reset, with related flag RXf_USED \ in re->extflags holding state. \ This is used only for ?? matches, \ and only on OP_MATCH and OP_QR */ - -#define PMf_UNUSED 0x00000100 /* free for use */ #define PMf_MAYBE_CONST 0x00000200 /* replacement contains variables */ #define PMf_USED 0x00000400 /* PMf_ONCE has matched successfully. \ -- 1.5.6.3 ```
p5pRT commented 14 years ago

From @demerphq

On 26 June 2010 21​:25\, karl williamson \perlbug\-followup@​perl\.org wrote​:

# New Ticket Created by  karl williamson # Please include the string​:  [perl #76122] # in the subject line of all future correspondence about this issue. # \<URL​: http​://rt.perl.org/rt3/Ticket/Display.html?id=76122 >

I'm hoping this is an uncontroversial patch\, in preparation for the new regex modifiers coming soon.  This does cause binary incompatibility.

Just wanted to comment\, on this one...

I worked a bit on an earlier phase of reorganizing these defines\, and frankly the only reason there is the RXf_ PMf_ difference\, or that one is not explicitly defined in terms of the other\, was because of my weak header-fu.

So for instance\, if you thought you could eliminate the duplication then I personally think it would probably be a good idea.

Anyway\, cant apply/test this right now\, but it looks sane to me.

Just curious about the comment patch - is it really necessary or just good practice? Dont the rules specify that comments are stipped first? Seems like it should go in regardless\, but im curious about the fine point of the rules.

Yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 14 years ago

The RT System itself - Status changed from 'new' to 'open'

p5pRT commented 14 years ago

From @khwilliamson

demerphq wrote​:

On 26 June 2010 21​:25\, karl williamson \perlbug\-followup@&#8203;perl\.org wrote​:

# New Ticket Created by karl williamson # Please include the string​: [perl #76122] # in the subject line of all future correspondence about this issue. # \<URL​: http​://rt.perl.org/rt3/Ticket/Display.html?id=76122 >

I'm hoping this is an uncontroversial patch\, in preparation for the new regex modifiers coming soon. This does cause binary incompatibility.

Just wanted to comment\, on this one...

I worked a bit on an earlier phase of reorganizing these defines\, and frankly the only reason there is the RXf_ PMf_ difference\, or that one is not explicitly defined in terms of the other\, was because of my weak header-fu.

So for instance\, if you thought you could eliminate the duplication then I personally think it would probably be a good idea.

I'll look into it.

Anyway\, cant apply/test this right now\, but it looks sane to me.

Just curious about the comment patch - is it really necessary or just good practice? Dont the rules specify that comments are stipped first? Seems like it should go in regardless\, but im curious about the fine point of the rules.

That it has worked without complaint from all compilers indicates it's not a compile problem. I looked at K&R 2nd edition\, and it's not clear to me what's legal from that. The reason I made the fix is that my vim doesn't like the way it was\, highlights it as being wrong. This is a distraction to me whenever I look at the code; adding extra cognitive load. So\, it's easier to change it once\, and then forget about it.

Yves

p5pRT commented 14 years ago

From @hvds

karl williamson \public@&#8203;khwilliamson\.com wrote​: :demerphq wrote​: :> Just curious about the comment patch - is it really necessary or just :> good practice? Dont the rules specify that comments are stipped first? :> Seems like it should go in regardless\, but im curious about the fine :> point of the rules. : :That it has worked without complaint from all compilers indicates it's :not a compile problem. I looked at K&R 2nd edition\, and it's not clear :to me what's legal from that. The reason I made the fix is that my vim :doesn't like the way it was\, highlights it as being wrong. This is a :distraction to me whenever I look at the code; adding extra cognitive :load. So\, it's easier to change it once\, and then forget about it.

I'd rather see the comments moved above the defines\, to remove the issue altogether.

Hugo

p5pRT commented 14 years ago

From @iabyn

On Wed\, Jun 30\, 2010 at 12​:40​:14PM +0100\, hv@​crypt.org wrote​:

I'd rather see the comments moved above the defines\, to remove the issue altogether.

+1

-- In England there is a special word which means the last sunshine of the summer. That word is "spring".

p5pRT commented 14 years ago

From @khwilliamson

Dave Mitchell wrote​:

On Wed\, Jun 30\, 2010 at 12​:40​:14PM +0100\, hv@​crypt.org wrote​:

I'd rather see the comments moved above the defines\, to remove the issue altogether.

+1

I will do this\, as well as combine the common parts of op.h and regexp.h\, as suggested by Yves.

p5pRT commented 14 years ago

From @khwilliamson

This series of commits is intended for 5.13.4. I imagine there will be multiple go-arounds before a version of it is accepted.

This patch is the first step in trying to add new regex modifiers. Currently the structures in op.h and regexp.h share common values\, which manually must be kept in sync. This patch creates a new header with those common values that is #included from the two affected header files. There are commits for ancillary small bug fixes and added comments I did as I went along.

However\, I left the two different symbols for each value because I wasn't sure what to do with them\, since they are both exported by B\, and I presume that means that external programs may be depending on them. I see that B has some explicit definitions for backward compatibility\, so I'm thinking it would be better to remove the duplicate symbols in the core\, and then define them there for external programs. But I don't know enough to know if that is reasonable or not.

That brings up another question that I need for future patches. We are adding mutually exclusive regex modifiers. It is more suitable to encode these into one multi-bit field instead of a single bit each. It also takes fewer bits\, and we are running out of bits\, and I can see other uses coming along down the road. For example\, traditional could be 0\, locale could be 1\, and unicode could be 2. This takes two bits for 3 values\, with room for a fourth\, instead of a bit for each value. But\, can/do external programs rely on the status quo with a bit for each thing?

It was most convenient to change the definitions to use eg.\, 1\<\<2 instead of 0x4. That meant I had to change the code that reads and evals those definitions to understand the left shift syntax.

I went out of the way to ensure binary compatibility with this patch. The actual bit structures haven't changed. Once a version of this is installed\, it will be easy to move things around to make room for the new bits needed.

p5pRT commented 14 years ago

From @khwilliamson

0001-op.h-Fix-comments-in-defines-that-cross-lines.patch ```diff From a34da8de098d7b78b39db49b9d4972ea4677b13e Mon Sep 17 00:00:00 2001 From: Karl Williamson Date: Mon, 19 Jul 2010 12:22:16 -0600 Subject: [PATCH] op.h: Fix comments in #defines that cross lines It is not good form to have comments in a #define continue onto the next line, especially not with a \ ending each continuation line. People preferred that the comment be placed before the #define --- op.h | 18 +++++++++--------- 1 files changed, 9 insertions(+), 9 deletions(-) diff --git a/op.h b/op.h index 2ea1dce..257a951 100644 --- a/op.h +++ b/op.h @@ -361,25 +361,25 @@ struct pmop { #define PMf_RETAINT 0x00000040 /* taint $1 etc. if target tainted */ -#define PMf_ONCE 0x00000080 /* match successfully only once per - reset, with related flag RXf_USED - in re->extflags holding state. - This is used only for ?? matches, - and only on OP_MATCH and OP_QR */ +/* match successfully only once per reset, with related flag RXf_USED in + * re->extflags holding state. This is used only for ?? matches, and only on + * OP_MATCH and OP_QR */ +#define PMf_ONCE 0x00000080 #define PMf_UNUSED 0x00000100 /* free for use */ #define PMf_MAYBE_CONST 0x00000200 /* replacement contains variables */ -#define PMf_USED 0x00000400 /* PMf_ONCE has matched successfully. - Not used under threading. */ +/* PMf_ONCE has matched successfully. Not used under threading. */ +#define PMf_USED 0x00000400 #define PMf_CONST 0x00000800 /* subst replacement is constant */ #define PMf_KEEP 0x00001000 /* keep 1st runtime pattern forever */ #define PMf_GLOBAL 0x00002000 /* pattern had a g modifier */ #define PMf_CONTINUE 0x00004000 /* don't reset pos() if //g fails */ #define PMf_EVAL 0x00008000 /* evaluating replacement as expr */ -#define PMf_NONDESTRUCT 0x00010000 /* Return substituted string instead - of modifying it. */ + +/* Return substituted string instead of modifying it. */ +#define PMf_NONDESTRUCT 0x00010000 /* The following flags have exact equivalents in regcomp.h with the prefix RXf_ * which are stored in the regexp->extflags member. If you change them here, -- 1.5.6.3 ```
p5pRT commented 14 years ago

From @khwilliamson

0002-regexp.h-Add-some-comments.patch ```diff From 234c07289d09c7f23b2084c13fdb7eed420820a9 Mon Sep 17 00:00:00 2001 From: Karl Williamson Date: Mon, 19 Jul 2010 13:06:19 -0600 Subject: [PATCH] regexp.h: Add some comments --- regexp.h | 12 +++++++++++- 1 files changed, 11 insertions(+), 1 deletions(-) diff --git a/regexp.h b/regexp.h index f87031c..6232356 100644 --- a/regexp.h +++ b/regexp.h @@ -229,7 +229,11 @@ and check for NULL. /* 0x3F of extflags is used by (RXf_)PMf_COMPILETIME * If you change these you need to change the equivalent flags in op.h, and - * vice versa. */ +/* vice versa. These need to be ordered so that the msix are contiguous + * starting at bit 0, followed by the p; bit 0 is because of the shift below + * being 0; see STD_PAT_MODS and INT_PAT_MODS below for the contiguity cause */ +/* the flags above are transfered from the PMOP->op_pmflags member during + * compilation */ #define RXf_PMf_MULTILINE 0x00000001 /* /m */ #define RXf_PMf_SINGLELINE 0x00000002 /* /s */ #define RXf_PMf_FOLD 0x00000004 /* /i */ @@ -271,8 +275,14 @@ and check for NULL. #define LOOP_PAT_MODS "gc" #define NONDESTRUCT_PAT_MODS "r" +/* This string is expected by regcomp.c to be ordered so that the first + * character is the flag in bit 0 of extflags; the next character is bit 1, + * etc. */ #define STD_PAT_MODS "msix" +/* This string is expected by XS_re_regexp_pattern() in universal.c to be ordered + * so that the first character is the flag in bit 0 of extflags; the next + * character is bit 1, etc. */ #define INT_PAT_MODS STD_PAT_MODS KEEPCOPY_PAT_MODS #define EXT_PAT_MODS ONCE_PAT_MODS KEEPCOPY_PAT_MODS -- 1.5.6.3 ```
p5pRT commented 14 years ago

From @khwilliamson

0003-defsubs_h.PL-Use-correct-variable-in-error-msg.patch ```diff From c8faf9b6e2f6b4eb56568bc2f4bb82ad1d605261 Mon Sep 17 00:00:00 2001 From: Karl Williamson Date: Mon, 19 Jul 2010 17:15:51 -0600 Subject: [PATCH] defsubs_h.PL: Use correct variable in error msg --- ext/B/defsubs_h.PL | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/ext/B/defsubs_h.PL b/ext/B/defsubs_h.PL index 684ca26..98a3e8b 100644 --- a/ext/B/defsubs_h.PL +++ b/ext/B/defsubs_h.PL @@ -7,7 +7,7 @@ my (undef, $headerpath) = @ARGV; my ($out) = __FILE__ =~ /(^.*)\.PL/i; $out =~ s/_h$/.h/; unlink $out if -l $out; -open(OUT,">$out") || die "Cannot open $file:$!"; +open(OUT,">$out") || die "Cannot open $out:$!"; print "Extracting $out...\n"; print OUT <<"END"; /* -- 1.5.6.3 ```
p5pRT commented 14 years ago

From @khwilliamson

0004-ext-B-defsubs_h.PL-add-explanatory-comment.patch ```diff From 70dde8ec230ed603488557ca42e83da54971fe2c Mon Sep 17 00:00:00 2001 From: Karl Williamson Date: Mon, 19 Jul 2010 17:18:03 -0600 Subject: [PATCH] ext/B/defsubs_h.PL: add explanatory comment --- ext/B/defsubs_h.PL | 4 ++++ 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/ext/B/defsubs_h.PL b/ext/B/defsubs_h.PL index 98a3e8b..f8fa1ed 100644 --- a/ext/B/defsubs_h.PL +++ b/ext/B/defsubs_h.PL @@ -72,6 +72,10 @@ if ($] < 5.011) { doconst(CVf_LOCKED); } +# First element in each tuple is the file; second is a regex snippet +# giving the prefix to limit the names of symbols to define that come +# from that file. If none, all symbols will be defined whose values +# match the pattern below. foreach my $tuple (['op.h'],['cop.h'],['regexp.h','RXf_']) { my $file = $tuple->[0]; -- 1.5.6.3 ```
p5pRT commented 14 years ago

From @khwilliamson

0005-regcomp.pl-Teach-to-handle-wider-range-of-exprs.patch ```diff From 25d892326935efa83c18dbe8745dc212330c376c Mon Sep 17 00:00:00 2001 From: Karl Williamson Date: Mon, 19 Jul 2010 21:46:49 -0600 Subject: [PATCH] regcomp.pl: Teach to handle wider range of exprs In particular teach it to handle definitions using <<, e.g. #define SYMBOL (1<<3) and to remember previous symbol definitons in the file so that symbol can be used in later definitions. --- regcomp.pl | 29 ++++++++++++++++++++++------- 1 files changed, 22 insertions(+), 7 deletions(-) diff --git a/regcomp.pl b/regcomp.pl index 9370487..aa0f0fe 100644 --- a/regcomp.pl +++ b/regcomp.pl @@ -258,17 +258,32 @@ EOP open my $fh,"<","regexp.h" or die "Can't read regexp.h: $!"; my %rxfv; +my %definitions; # Remember what the symbol definitions are my $val = 0; my %reverse; while (<$fh>) { - if (/#define\s+(RXf_\w+)\s+(0x[A-F\d]+)/i) { - my $newval = eval $2; - if($val & $newval) { - die sprintf "Both $1 and $reverse{$newval} use %08X", $newval; - } + + # optional leading '_'. Return symbol in $1, and strip it from + # rest of line + if (s/ \#define \s+ ( _? RXf_ \w+ ) \s+ //xi) { + chomp; + my $define = $1; + s: / \s* \* .*? \* \s* / : :x; # Replace comments by a blank + + # Replace any prior defined symbols by their values + foreach my $key (keys %definitions) { + s/\b$key\b/$definitions{$key}/g; + } + my $newval = eval $_; # Get numeric definition + + $definitions{$define} = $newval; + + if($val & $newval) { + die sprintf "Both $define and $reverse{$newval} use %08X", $newval; + } $val|=$newval; - $rxfv{$1}= $newval; - $reverse{$newval} = $1; + $rxfv{$define}= $newval; + $reverse{$newval} = $define; } } my %vrxf=reverse %rxfv; -- 1.5.6.3 ```
p5pRT commented 14 years ago

From @khwilliamson

0006-ext-B-defsubs_h.PL-teach-to-allow-exprs-with.patch ```diff From a73a53c7d9ec8f7563eec5ba858cc7c16a3bf0fa Mon Sep 17 00:00:00 2001 From: Karl Williamson Date: Mon, 19 Jul 2010 21:51:32 -0600 Subject: [PATCH] ext/B/defsubs_h.PL: teach to allow exprs with << Allow #defines which have left shift operators in them. --- ext/B/defsubs_h.PL | 6 +++++- 1 files changed, 5 insertions(+), 1 deletions(-) diff --git a/ext/B/defsubs_h.PL b/ext/B/defsubs_h.PL index f8fa1ed..d8e1439 100644 --- a/ext/B/defsubs_h.PL +++ b/ext/B/defsubs_h.PL @@ -84,7 +84,11 @@ foreach my $tuple (['op.h'],['cop.h'],['regexp.h','RXf_']) open(OPH,"$path") || die "Cannot open $path:$!"; while () { - doconst($1) if (/#define\s+($pfx\w+)\s+([\(\)\|\dx]+)\s*(?:$|\/\*)/); + doconst($1) if (/ \#define \s+ ( $pfx \w+ ) \s+ + ( [()|\dx]+ # Parens, '|', digits, 'x' + | \(? \d+ \s* << .*? # digits left shifted by anything + ) \s* (?: $| \/ \* ) # ending at comment or $ + /x); } close(OPH); } -- 1.5.6.3 ```
p5pRT commented 14 years ago

From @khwilliamson

0007-Refactor-common-parts-of-op.h-regexp.h-into-new-.h.patch ```diff From 1be64a430c7464c6e088079ce3880c69bb88767a Mon Sep 17 00:00:00 2001 From: Karl Williamson Date: Mon, 19 Jul 2010 22:26:43 -0600 Subject: [PATCH] Refactor common parts of op.h, regexp.h into new .h op.h and regexp.h share common elements in their data structures. They have had to manually be kept in sync. This patch makes it easier by putting those common parts into a common header #included by the two. To do this, it seemed easiest to change the symbol definitions to use left shifts to generate the flag bits. But this meant that regcomp.pl and axt/B/defsubs_h.PL had to be taught to recognize those forms of expressions, done in separate commits --- MANIFEST | 1 + ext/B/defsubs_h.PL | 2 +- op.h | 48 +++++++++++++++++++++-------------------- op_reg_common.h | 27 +++++++++++++++++++++++ regcomp.pl | 47 +++++++++++++++++++++------------------- regexp.h | 60 +++++++++++++++++++++++++-------------------------- 6 files changed, 108 insertions(+), 77 deletions(-) create mode 100644 op_reg_common.h diff --git a/MANIFEST b/MANIFEST index 74e8c46..9e4ee80 100644 --- a/MANIFEST +++ b/MANIFEST @@ -3789,6 +3789,7 @@ opcode.h Automatically generated opcode header opcode.pl Opcode header generator op.h Opcode syntax tree header opnames.h Automatically generated opcode header +op_reg_common.h Common parts of op.h, regexp.h header os2/Changes Changelog for OS/2 port os2/diff.configure Patches to Configure os2/dlfcn.h Addon for dl_open diff --git a/ext/B/defsubs_h.PL b/ext/B/defsubs_h.PL index d8e1439..b6d8aaa 100644 --- a/ext/B/defsubs_h.PL +++ b/ext/B/defsubs_h.PL @@ -76,7 +76,7 @@ if ($] < 5.011) { # giving the prefix to limit the names of symbols to define that come # from that file. If none, all symbols will be defined whose values # match the pattern below. -foreach my $tuple (['op.h'],['cop.h'],['regexp.h','RXf_']) +foreach my $tuple (['op_reg_common.h','(?:(?:RXf_)?PMf_)'],['op.h'],['cop.h'],['regexp.h','RXf_']) { my $file = $tuple->[0]; my $pfx = $tuple->[1] || ''; diff --git a/op.h b/op.h index 257a951..7873a74 100644 --- a/op.h +++ b/op.h @@ -36,6 +36,7 @@ * the operation is privatized by a check routine, * which may or may not check number of children). */ +#include "op_reg_common.h" #define OPCODE U16 @@ -359,38 +360,39 @@ struct pmop { #define PM_SETRE(o,r) ((o)->op_pmregexp = (r)) #endif - -#define PMf_RETAINT 0x00000040 /* taint $1 etc. if target tainted */ +/* taint $1 etc. if target tainted */ +#define PMf_RETAINT (1<<(_RXf_PMf_SHIFT+1)) /* match successfully only once per reset, with related flag RXf_USED in * re->extflags holding state. This is used only for ?? matches, and only on * OP_MATCH and OP_QR */ -#define PMf_ONCE 0x00000080 -#define PMf_UNUSED 0x00000100 /* free for use */ -#define PMf_MAYBE_CONST 0x00000200 /* replacement contains variables */ +#define PMf_ONCE (1<<(_RXf_PMf_SHIFT+2)) + +/* replacement contains variables */ +#define PMf_MAYBE_CONST (1<<(_RXf_PMf_SHIFT+3)) + +/* PMf_ONCE has matched successfully. Not used under threading. */ +#define PMf_USED (1<<(_RXf_PMf_SHIFT+4)) + +/* subst replacement is constant */ +#define PMf_CONST (1<<(_RXf_PMf_SHIFT+5)) -/* PMf_ONCE has matched successfully. Not used under threading. */ -#define PMf_USED 0x00000400 +/* keep 1st runtime pattern forever */ +#define PMf_KEEP (1<<(_RXf_PMf_SHIFT+6)) +#define PMf_GLOBAL (1<<(_RXf_PMf_SHIFT+7)) /* pattern had a g modifier */ -#define PMf_CONST 0x00000800 /* subst replacement is constant */ -#define PMf_KEEP 0x00001000 /* keep 1st runtime pattern forever */ -#define PMf_GLOBAL 0x00002000 /* pattern had a g modifier */ -#define PMf_CONTINUE 0x00004000 /* don't reset pos() if //g fails */ -#define PMf_EVAL 0x00008000 /* evaluating replacement as expr */ +/* don't reset pos() if //g fails */ +#define PMf_CONTINUE (1<<(_RXf_PMf_SHIFT+8)) + +/* evaluating replacement as expr */ +#define PMf_EVAL (1<<(_RXf_PMf_SHIFT+9)) /* Return substituted string instead of modifying it. */ -#define PMf_NONDESTRUCT 0x00010000 +#define PMf_NONDESTRUCT (1<<(_RXf_PMf_SHIFT+10)) -/* The following flags have exact equivalents in regcomp.h with the prefix RXf_ - * which are stored in the regexp->extflags member. If you change them here, - * you have to change them there, and vice versa. - */ -#define PMf_MULTILINE 0x00000001 /* assume multiple lines */ -#define PMf_SINGLELINE 0x00000002 /* assume single line */ -#define PMf_FOLD 0x00000004 /* case insensitivity */ -#define PMf_EXTENDED 0x00000008 /* chuck embedded whitespace */ -#define PMf_KEEPCOPY 0x00000010 /* copy the string when matching */ -#define PMf_LOCALE 0x00000020 /* use locale for character types */ +#if _RXf_PMf_SHIFT+10 > 31 +# error Too many RXf_PMf bits used. See above and regnodes.h for any spare in middle +#endif /* mask of bits that need to be transfered to re->extflags */ #define PMf_COMPILETIME (PMf_MULTILINE|PMf_SINGLELINE|PMf_LOCALE|PMf_FOLD|PMf_EXTENDED|PMf_KEEPCOPY) diff --git a/op_reg_common.h b/op_reg_common.h new file mode 100644 index 0000000..b0fd273 --- /dev/null +++ b/op_reg_common.h @@ -0,0 +1,27 @@ +/* op_reg_common.h + * + * Definitions common to by op.h and regexp.h + * + * Copyright (C) 2010 by Larry Wall and others + * + * You may distribute under the terms of either the GNU General Public + * License or the Artistic License, as specified in the README file. + * + */ + +/* These defines are used in both op.h and regexp.h The definitions use the + * shift form so that ext/B/defsubs_h.PL will pick them up */ +#define RXf_PMf_MULTILINE (1 << 0) /* /m */ +#define PMf_MULTILINE (1 << 0) /* /m */ +#define RXf_PMf_SINGLELINE (1 << 1) /* /s */ +#define PMf_SINGLELINE (1 << 1) /* /s */ +#define RXf_PMf_FOLD (1 << 2) /* /i */ +#define PMf_FOLD (1 << 2) /* /i */ +#define RXf_PMf_EXTENDED (1 << 3) /* /x */ +#define PMf_EXTENDED (1 << 3) /* /x */ +#define RXf_PMf_KEEPCOPY (1 << 4) /* /p */ +#define PMf_KEEPCOPY (1 << 4) /* /p */ +#define RXf_PMf_LOCALE (1 << 5) +#define PMf_LOCALE (1 << 5) + +#define _RXf_PMf_SHIFT 5 /* Begins with '_' so won't be exported by B */ diff --git a/regcomp.pl b/regcomp.pl index aa0f0fe..d85482c 100644 --- a/regcomp.pl +++ b/regcomp.pl @@ -256,36 +256,39 @@ EXTCONST char * PL_reg_extflags_name[]; EXTCONST char * const PL_reg_extflags_name[] = { EOP -open my $fh,"<","regexp.h" or die "Can't read regexp.h: $!"; my %rxfv; my %definitions; # Remember what the symbol definitions are my $val = 0; my %reverse; -while (<$fh>) { - - # optional leading '_'. Return symbol in $1, and strip it from - # rest of line - if (s/ \#define \s+ ( _? RXf_ \w+ ) \s+ //xi) { - chomp; - my $define = $1; - s: / \s* \* .*? \* \s* / : :x; # Replace comments by a blank - - # Replace any prior defined symbols by their values - foreach my $key (keys %definitions) { - s/\b$key\b/$definitions{$key}/g; - } - my $newval = eval $_; # Get numeric definition +foreach my $file ("op_reg_common.h", "regexp.h") { + open my $fh,"<", $file or die "Can't read $file: $!"; + while (<$fh>) { + + # optional leading '_'. Return symbol in $1, and strip it from + # rest of line + if (s/ \#define \s+ ( _? RXf_ \w+ ) \s+ //xi) { + chomp; + my $define = $1; + s: / \s* \* .*? \* \s* / : :x; # Replace comments by a blank + + # Replace any prior defined symbols by their values + foreach my $key (keys %definitions) { + s/\b$key\b/$definitions{$key}/g; + } + my $newval = eval $_; # Get numeric definition - $definitions{$define} = $newval; + $definitions{$define} = $newval; - if($val & $newval) { - die sprintf "Both $define and $reverse{$newval} use %08X", $newval; + next unless $_ =~ /<op_pmflags member during * compilation */ -#define RXf_PMf_MULTILINE 0x00000001 /* /m */ -#define RXf_PMf_SINGLELINE 0x00000002 /* /s */ -#define RXf_PMf_FOLD 0x00000004 /* /i */ -#define RXf_PMf_EXTENDED 0x00000008 /* /x */ -#define RXf_PMf_KEEPCOPY 0x00000010 /* /p */ -#define RXf_PMf_LOCALE 0x00000020 /* use locale */ -/* these flags are transfered from the PMOP->op_pmflags member during compilation */ #define RXf_PMf_STD_PMMOD_SHIFT 0 #define RXf_PMf_STD_PMMOD (RXf_PMf_MULTILINE|RXf_PMf_SINGLELINE|RXf_PMf_FOLD|RXf_PMf_EXTENDED) #define RXf_PMf_COMPILETIME (RXf_PMf_MULTILINE|RXf_PMf_SINGLELINE|RXf_PMf_LOCALE|RXf_PMf_FOLD|RXf_PMf_EXTENDED|RXf_PMf_KEEPCOPY) @@ -297,53 +292,56 @@ and check for NULL. */ /* Anchor and GPOS related stuff */ -#define RXf_ANCH_BOL 0x00000100 -#define RXf_ANCH_MBOL 0x00000200 -#define RXf_ANCH_SBOL 0x00000400 -#define RXf_ANCH_GPOS 0x00000800 -#define RXf_GPOS_SEEN 0x00001000 -#define RXf_GPOS_FLOAT 0x00002000 +#define RXf_ANCH_BOL (1<<(_RXf_PMf_SHIFT+3)) +#define RXf_ANCH_MBOL (1<<(_RXf_PMf_SHIFT+4)) +#define RXf_ANCH_SBOL (1<<(_RXf_PMf_SHIFT+5)) +#define RXf_ANCH_GPOS (1<<(_RXf_PMf_SHIFT+6)) +#define RXf_GPOS_SEEN (1<<(_RXf_PMf_SHIFT+7)) +#define RXf_GPOS_FLOAT (1<<(_RXf_PMf_SHIFT+8)) /* two bits here */ #define RXf_ANCH (RXf_ANCH_BOL|RXf_ANCH_MBOL|RXf_ANCH_GPOS|RXf_ANCH_SBOL) #define RXf_GPOS_CHECK (RXf_GPOS_SEEN|RXf_ANCH_GPOS) #define RXf_ANCH_SINGLE (RXf_ANCH_SBOL|RXf_ANCH_GPOS) /* What we have seen */ -#define RXf_LOOKBEHIND_SEEN 0x00004000 -#define RXf_EVAL_SEEN 0x00008000 -#define RXf_CANY_SEEN 0x00010000 +#define RXf_LOOKBEHIND_SEEN (1<<(_RXf_PMf_SHIFT+9)) +#define RXf_EVAL_SEEN (1<<(_RXf_PMf_SHIFT+10)) +#define RXf_CANY_SEEN (1<<(_RXf_PMf_SHIFT+11)) /* Special */ -#define RXf_NOSCAN 0x00020000 -#define RXf_CHECK_ALL 0x00040000 +#define RXf_NOSCAN (1<<(_RXf_PMf_SHIFT+12)) +#define RXf_CHECK_ALL (1<<(_RXf_PMf_SHIFT+13)) /* UTF8 related */ -#define RXf_MATCH_UTF8 0x00100000 +#define RXf_MATCH_UTF8 (1<<(_RXf_PMf_SHIFT+15)) /* Intuit related */ -#define RXf_USE_INTUIT_NOML 0x00200000 -#define RXf_USE_INTUIT_ML 0x00400000 -#define RXf_INTUIT_TAIL 0x00800000 +#define RXf_USE_INTUIT_NOML (1<<(_RXf_PMf_SHIFT+16)) +#define RXf_USE_INTUIT_ML (1<<(_RXf_PMf_SHIFT+17)) +#define RXf_INTUIT_TAIL (1<<(_RXf_PMf_SHIFT+18)) /* Set in Perl_pmruntime if op_flags & OPf_SPECIAL, i.e. split. Will be used by regex engines to check whether they should set RXf_SKIPWHITE */ -#define RXf_SPLIT 0x01000000 +#define RXf_SPLIT (1<<(_RXf_PMf_SHIFT+19)) #define RXf_USE_INTUIT (RXf_USE_INTUIT_NOML|RXf_USE_INTUIT_ML) /* Copy and tainted info */ -#define RXf_COPY_DONE 0x02000000 -#define RXf_TAINTED_SEEN 0x04000000 -#define RXf_TAINTED 0x08000000 /* this pattern is tainted */ +#define RXf_COPY_DONE (1<<(_RXf_PMf_SHIFT+20)) +#define RXf_TAINTED_SEEN (1<<(_RXf_PMf_SHIFT+21)) +#define RXf_TAINTED (1<<(_RXf_PMf_SHIFT+22)) /* this pattern is tainted */ /* Flags indicating special patterns */ -#define RXf_START_ONLY 0x10000000 /* Pattern is /^/ */ -#define RXf_SKIPWHITE 0x20000000 /* Pattern is for a split / / */ -#define RXf_WHITE 0x40000000 /* Pattern is /\s+/ */ -#define RXf_NULL 0x80000000 /* Pattern is // */ +#define RXf_START_ONLY (1<<(_RXf_PMf_SHIFT+23)) /* Pattern is /^/ */ +#define RXf_SKIPWHITE (1<<(_RXf_PMf_SHIFT+24)) /* Pattern is for a split / / */ +#define RXf_WHITE (1<<(_RXf_PMf_SHIFT+25)) /* Pattern is /\s+/ */ +#define RXf_NULL (1<<(_RXf_PMf_SHIFT+26)) /* Pattern is // */ +#if _RXf_PMf_SHIFT+23 > 31 +# error Too many RXf_PMf bits used. See regnodes.h for any spare in middle +#endif /* * NOTE: if you modify any RXf flags you should run regen.pl or regcomp.pl -- 1.5.6.3 ```
p5pRT commented 14 years ago

From @khwilliamson

This patch seems to be warnocked\, so I'm trying again\, changing the subject line to hopefully attract more interest. I had held off submitting it late in 5.13.3 to early 5.13.4 at the release manager's request because I thought it would break binary compatibility. But now we are well into the new cycle; I wouldn't want to miss this cycle too.   I ended up splitting the patch so that the part I'm currently submitting doesn't break binary compatibility; that part is yet to come.

This patch is the first step in trying to add new regex modifiers\, and then fix Unicode handling in regexes.

The commits still apply cleanly. A rebased version is at github git​://github.com/khwilliamson/perl.git branch match_sw

Currently the structures in op.h and regexp.h share common values\, which manually must be kept in sync. This patch creates a new header with those common values that is #included from the two affected header files. There are commits for ancillary small bug fixes and added comments I did as I went along.

I'm not sure if this common header is the best way to approach the matter\, but it works. I'm open to other ideas.

I left the two different symbols for each value because I wasn't sure what to do with them\, since they are both exported by B\, and I presume that means that external programs may be depending on them. I see that B has some explicit definitions for backward compatibility\, so I'm thinking it would be better to remove the duplicate symbols in the core\, and then define them there for external programs. But I don't know enough to know if that is reasonable or not. Please advise me.

It was most convenient to change the definitions to use eg.\, 1\<\<2 instead of 0x4. That meant I had to change the code that reads and evals those definitions to understand the left shift syntax.

I went out of the way to ensure binary compatibility with this patch. The actual bit structures haven't changed. Once a version of this is installed\, it will be easy to move things around to make room for the new bits needed.

p5pRT commented 14 years ago

From @khwilliamson

0001-op.h-Fix-comments-in-defines-that-cross-lines.patch ```diff >From a34da8de098d7b78b39db49b9d4972ea4677b13e Mon Sep 17 00:00:00 2001 From: Karl Williamson Date: Mon, 19 Jul 2010 12:22:16 -0600 Subject: [PATCH] op.h: Fix comments in #defines that cross lines It is not good form to have comments in a #define continue onto the next line, especially not with a \ ending each continuation line. People preferred that the comment be placed before the #define --- op.h | 18 +++++++++--------- 1 files changed, 9 insertions(+), 9 deletions(-) diff --git a/op.h b/op.h index 2ea1dce..257a951 100644 --- a/op.h +++ b/op.h @@ -361,25 +361,25 @@ struct pmop { #define PMf_RETAINT 0x00000040 /* taint $1 etc. if target tainted */ -#define PMf_ONCE 0x00000080 /* match successfully only once per - reset, with related flag RXf_USED - in re->extflags holding state. - This is used only for ?? matches, - and only on OP_MATCH and OP_QR */ +/* match successfully only once per reset, with related flag RXf_USED in + * re->extflags holding state. This is used only for ?? matches, and only on + * OP_MATCH and OP_QR */ +#define PMf_ONCE 0x00000080 #define PMf_UNUSED 0x00000100 /* free for use */ #define PMf_MAYBE_CONST 0x00000200 /* replacement contains variables */ -#define PMf_USED 0x00000400 /* PMf_ONCE has matched successfully. - Not used under threading. */ +/* PMf_ONCE has matched successfully. Not used under threading. */ +#define PMf_USED 0x00000400 #define PMf_CONST 0x00000800 /* subst replacement is constant */ #define PMf_KEEP 0x00001000 /* keep 1st runtime pattern forever */ #define PMf_GLOBAL 0x00002000 /* pattern had a g modifier */ #define PMf_CONTINUE 0x00004000 /* don't reset pos() if //g fails */ #define PMf_EVAL 0x00008000 /* evaluating replacement as expr */ -#define PMf_NONDESTRUCT 0x00010000 /* Return substituted string instead - of modifying it. */ + +/* Return substituted string instead of modifying it. */ +#define PMf_NONDESTRUCT 0x00010000 /* The following flags have exact equivalents in regcomp.h with the prefix RXf_ * which are stored in the regexp->extflags member. If you change them here, -- 1.5.6.3 ```
p5pRT commented 14 years ago

From @khwilliamson

0002-regexp.h-Add-some-comments.patch ```diff >From 234c07289d09c7f23b2084c13fdb7eed420820a9 Mon Sep 17 00:00:00 2001 From: Karl Williamson Date: Mon, 19 Jul 2010 13:06:19 -0600 Subject: [PATCH] regexp.h: Add some comments --- regexp.h | 12 +++++++++++- 1 files changed, 11 insertions(+), 1 deletions(-) diff --git a/regexp.h b/regexp.h index f87031c..6232356 100644 --- a/regexp.h +++ b/regexp.h @@ -229,7 +229,11 @@ and check for NULL. /* 0x3F of extflags is used by (RXf_)PMf_COMPILETIME * If you change these you need to change the equivalent flags in op.h, and - * vice versa. */ +/* vice versa. These need to be ordered so that the msix are contiguous + * starting at bit 0, followed by the p; bit 0 is because of the shift below + * being 0; see STD_PAT_MODS and INT_PAT_MODS below for the contiguity cause */ +/* the flags above are transfered from the PMOP->op_pmflags member during + * compilation */ #define RXf_PMf_MULTILINE 0x00000001 /* /m */ #define RXf_PMf_SINGLELINE 0x00000002 /* /s */ #define RXf_PMf_FOLD 0x00000004 /* /i */ @@ -271,8 +275,14 @@ and check for NULL. #define LOOP_PAT_MODS "gc" #define NONDESTRUCT_PAT_MODS "r" +/* This string is expected by regcomp.c to be ordered so that the first + * character is the flag in bit 0 of extflags; the next character is bit 1, + * etc. */ #define STD_PAT_MODS "msix" +/* This string is expected by XS_re_regexp_pattern() in universal.c to be ordered + * so that the first character is the flag in bit 0 of extflags; the next + * character is bit 1, etc. */ #define INT_PAT_MODS STD_PAT_MODS KEEPCOPY_PAT_MODS #define EXT_PAT_MODS ONCE_PAT_MODS KEEPCOPY_PAT_MODS -- 1.5.6.3 ```
p5pRT commented 14 years ago

From @khwilliamson

0003-defsubs_h.PL-Use-correct-variable-in-error-msg.patch ```diff >From c8faf9b6e2f6b4eb56568bc2f4bb82ad1d605261 Mon Sep 17 00:00:00 2001 From: Karl Williamson Date: Mon, 19 Jul 2010 17:15:51 -0600 Subject: [PATCH] defsubs_h.PL: Use correct variable in error msg --- ext/B/defsubs_h.PL | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/ext/B/defsubs_h.PL b/ext/B/defsubs_h.PL index 684ca26..98a3e8b 100644 --- a/ext/B/defsubs_h.PL +++ b/ext/B/defsubs_h.PL @@ -7,7 +7,7 @@ my (undef, $headerpath) = @ARGV; my ($out) = __FILE__ =~ /(^.*)\.PL/i; $out =~ s/_h$/.h/; unlink $out if -l $out; -open(OUT,">$out") || die "Cannot open $file:$!"; +open(OUT,">$out") || die "Cannot open $out:$!"; print "Extracting $out...\n"; print OUT <<"END"; /* -- 1.5.6.3 ```
p5pRT commented 14 years ago

From @khwilliamson

0004-ext-B-defsubs_h.PL-add-explanatory-comment.patch ```diff >From 70dde8ec230ed603488557ca42e83da54971fe2c Mon Sep 17 00:00:00 2001 From: Karl Williamson Date: Mon, 19 Jul 2010 17:18:03 -0600 Subject: [PATCH] ext/B/defsubs_h.PL: add explanatory comment --- ext/B/defsubs_h.PL | 4 ++++ 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/ext/B/defsubs_h.PL b/ext/B/defsubs_h.PL index 98a3e8b..f8fa1ed 100644 --- a/ext/B/defsubs_h.PL +++ b/ext/B/defsubs_h.PL @@ -72,6 +72,10 @@ if ($] < 5.011) { doconst(CVf_LOCKED); } +# First element in each tuple is the file; second is a regex snippet +# giving the prefix to limit the names of symbols to define that come +# from that file. If none, all symbols will be defined whose values +# match the pattern below. foreach my $tuple (['op.h'],['cop.h'],['regexp.h','RXf_']) { my $file = $tuple->[0]; -- 1.5.6.3 ```
p5pRT commented 14 years ago

From @khwilliamson

0005-regcomp.pl-Teach-to-handle-wider-range-of-exprs.patch ```diff >From 25d892326935efa83c18dbe8745dc212330c376c Mon Sep 17 00:00:00 2001 From: Karl Williamson Date: Mon, 19 Jul 2010 21:46:49 -0600 Subject: [PATCH] regcomp.pl: Teach to handle wider range of exprs In particular teach it to handle definitions using <<, e.g. #define SYMBOL (1<<3) and to remember previous symbol definitons in the file so that symbol can be used in later definitions. --- regcomp.pl | 29 ++++++++++++++++++++++------- 1 files changed, 22 insertions(+), 7 deletions(-) diff --git a/regcomp.pl b/regcomp.pl index 9370487..aa0f0fe 100644 --- a/regcomp.pl +++ b/regcomp.pl @@ -258,17 +258,32 @@ EOP open my $fh,"<","regexp.h" or die "Can't read regexp.h: $!"; my %rxfv; +my %definitions; # Remember what the symbol definitions are my $val = 0; my %reverse; while (<$fh>) { - if (/#define\s+(RXf_\w+)\s+(0x[A-F\d]+)/i) { - my $newval = eval $2; - if($val & $newval) { - die sprintf "Both $1 and $reverse{$newval} use %08X", $newval; - } + + # optional leading '_'. Return symbol in $1, and strip it from + # rest of line + if (s/ \#define \s+ ( _? RXf_ \w+ ) \s+ //xi) { + chomp; + my $define = $1; + s: / \s* \* .*? \* \s* / : :x; # Replace comments by a blank + + # Replace any prior defined symbols by their values + foreach my $key (keys %definitions) { + s/\b$key\b/$definitions{$key}/g; + } + my $newval = eval $_; # Get numeric definition + + $definitions{$define} = $newval; + + if($val & $newval) { + die sprintf "Both $define and $reverse{$newval} use %08X", $newval; + } $val|=$newval; - $rxfv{$1}= $newval; - $reverse{$newval} = $1; + $rxfv{$define}= $newval; + $reverse{$newval} = $define; } } my %vrxf=reverse %rxfv; -- 1.5.6.3 ```
p5pRT commented 14 years ago

From @khwilliamson

0006-ext-B-defsubs_h.PL-teach-to-allow-exprs-with.patch ```diff >From a73a53c7d9ec8f7563eec5ba858cc7c16a3bf0fa Mon Sep 17 00:00:00 2001 From: Karl Williamson Date: Mon, 19 Jul 2010 21:51:32 -0600 Subject: [PATCH] ext/B/defsubs_h.PL: teach to allow exprs with << Allow #defines which have left shift operators in them. --- ext/B/defsubs_h.PL | 6 +++++- 1 files changed, 5 insertions(+), 1 deletions(-) diff --git a/ext/B/defsubs_h.PL b/ext/B/defsubs_h.PL index f8fa1ed..d8e1439 100644 --- a/ext/B/defsubs_h.PL +++ b/ext/B/defsubs_h.PL @@ -84,7 +84,11 @@ foreach my $tuple (['op.h'],['cop.h'],['regexp.h','RXf_']) open(OPH,"$path") || die "Cannot open $path:$!"; while () { - doconst($1) if (/#define\s+($pfx\w+)\s+([\(\)\|\dx]+)\s*(?:$|\/\*)/); + doconst($1) if (/ \#define \s+ ( $pfx \w+ ) \s+ + ( [()|\dx]+ # Parens, '|', digits, 'x' + | \(? \d+ \s* << .*? # digits left shifted by anything + ) \s* (?: $| \/ \* ) # ending at comment or $ + /x); } close(OPH); } -- 1.5.6.3 ```
p5pRT commented 14 years ago

From @khwilliamson

0007-Refactor-common-parts-of-op.h-regexp.h-into-new-.h.patch ```diff >From 1be64a430c7464c6e088079ce3880c69bb88767a Mon Sep 17 00:00:00 2001 From: Karl Williamson Date: Mon, 19 Jul 2010 22:26:43 -0600 Subject: [PATCH] Refactor common parts of op.h, regexp.h into new .h op.h and regexp.h share common elements in their data structures. They have had to manually be kept in sync. This patch makes it easier by putting those common parts into a common header #included by the two. To do this, it seemed easiest to change the symbol definitions to use left shifts to generate the flag bits. But this meant that regcomp.pl and axt/B/defsubs_h.PL had to be taught to recognize those forms of expressions, done in separate commits --- MANIFEST | 1 + ext/B/defsubs_h.PL | 2 +- op.h | 48 +++++++++++++++++++++-------------------- op_reg_common.h | 27 +++++++++++++++++++++++ regcomp.pl | 47 +++++++++++++++++++++------------------- regexp.h | 60 +++++++++++++++++++++++++-------------------------- 6 files changed, 108 insertions(+), 77 deletions(-) create mode 100644 op_reg_common.h diff --git a/MANIFEST b/MANIFEST index 74e8c46..9e4ee80 100644 --- a/MANIFEST +++ b/MANIFEST @@ -3789,6 +3789,7 @@ opcode.h Automatically generated opcode header opcode.pl Opcode header generator op.h Opcode syntax tree header opnames.h Automatically generated opcode header +op_reg_common.h Common parts of op.h, regexp.h header os2/Changes Changelog for OS/2 port os2/diff.configure Patches to Configure os2/dlfcn.h Addon for dl_open diff --git a/ext/B/defsubs_h.PL b/ext/B/defsubs_h.PL index d8e1439..b6d8aaa 100644 --- a/ext/B/defsubs_h.PL +++ b/ext/B/defsubs_h.PL @@ -76,7 +76,7 @@ if ($] < 5.011) { # giving the prefix to limit the names of symbols to define that come # from that file. If none, all symbols will be defined whose values # match the pattern below. -foreach my $tuple (['op.h'],['cop.h'],['regexp.h','RXf_']) +foreach my $tuple (['op_reg_common.h','(?:(?:RXf_)?PMf_)'],['op.h'],['cop.h'],['regexp.h','RXf_']) { my $file = $tuple->[0]; my $pfx = $tuple->[1] || ''; diff --git a/op.h b/op.h index 257a951..7873a74 100644 --- a/op.h +++ b/op.h @@ -36,6 +36,7 @@ * the operation is privatized by a check routine, * which may or may not check number of children). */ +#include "op_reg_common.h" #define OPCODE U16 @@ -359,38 +360,39 @@ struct pmop { #define PM_SETRE(o,r) ((o)->op_pmregexp = (r)) #endif - -#define PMf_RETAINT 0x00000040 /* taint $1 etc. if target tainted */ +/* taint $1 etc. if target tainted */ +#define PMf_RETAINT (1<<(_RXf_PMf_SHIFT+1)) /* match successfully only once per reset, with related flag RXf_USED in * re->extflags holding state. This is used only for ?? matches, and only on * OP_MATCH and OP_QR */ -#define PMf_ONCE 0x00000080 -#define PMf_UNUSED 0x00000100 /* free for use */ -#define PMf_MAYBE_CONST 0x00000200 /* replacement contains variables */ +#define PMf_ONCE (1<<(_RXf_PMf_SHIFT+2)) + +/* replacement contains variables */ +#define PMf_MAYBE_CONST (1<<(_RXf_PMf_SHIFT+3)) + +/* PMf_ONCE has matched successfully. Not used under threading. */ +#define PMf_USED (1<<(_RXf_PMf_SHIFT+4)) + +/* subst replacement is constant */ +#define PMf_CONST (1<<(_RXf_PMf_SHIFT+5)) -/* PMf_ONCE has matched successfully. Not used under threading. */ -#define PMf_USED 0x00000400 +/* keep 1st runtime pattern forever */ +#define PMf_KEEP (1<<(_RXf_PMf_SHIFT+6)) +#define PMf_GLOBAL (1<<(_RXf_PMf_SHIFT+7)) /* pattern had a g modifier */ -#define PMf_CONST 0x00000800 /* subst replacement is constant */ -#define PMf_KEEP 0x00001000 /* keep 1st runtime pattern forever */ -#define PMf_GLOBAL 0x00002000 /* pattern had a g modifier */ -#define PMf_CONTINUE 0x00004000 /* don't reset pos() if //g fails */ -#define PMf_EVAL 0x00008000 /* evaluating replacement as expr */ +/* don't reset pos() if //g fails */ +#define PMf_CONTINUE (1<<(_RXf_PMf_SHIFT+8)) + +/* evaluating replacement as expr */ +#define PMf_EVAL (1<<(_RXf_PMf_SHIFT+9)) /* Return substituted string instead of modifying it. */ -#define PMf_NONDESTRUCT 0x00010000 +#define PMf_NONDESTRUCT (1<<(_RXf_PMf_SHIFT+10)) -/* The following flags have exact equivalents in regcomp.h with the prefix RXf_ - * which are stored in the regexp->extflags member. If you change them here, - * you have to change them there, and vice versa. - */ -#define PMf_MULTILINE 0x00000001 /* assume multiple lines */ -#define PMf_SINGLELINE 0x00000002 /* assume single line */ -#define PMf_FOLD 0x00000004 /* case insensitivity */ -#define PMf_EXTENDED 0x00000008 /* chuck embedded whitespace */ -#define PMf_KEEPCOPY 0x00000010 /* copy the string when matching */ -#define PMf_LOCALE 0x00000020 /* use locale for character types */ +#if _RXf_PMf_SHIFT+10 > 31 +# error Too many RXf_PMf bits used. See above and regnodes.h for any spare in middle +#endif /* mask of bits that need to be transfered to re->extflags */ #define PMf_COMPILETIME (PMf_MULTILINE|PMf_SINGLELINE|PMf_LOCALE|PMf_FOLD|PMf_EXTENDED|PMf_KEEPCOPY) diff --git a/op_reg_common.h b/op_reg_common.h new file mode 100644 index 0000000..b0fd273 --- /dev/null +++ b/op_reg_common.h @@ -0,0 +1,27 @@ +/* op_reg_common.h + * + * Definitions common to by op.h and regexp.h + * + * Copyright (C) 2010 by Larry Wall and others + * + * You may distribute under the terms of either the GNU General Public + * License or the Artistic License, as specified in the README file. + * + */ + +/* These defines are used in both op.h and regexp.h The definitions use the + * shift form so that ext/B/defsubs_h.PL will pick them up */ +#define RXf_PMf_MULTILINE (1 << 0) /* /m */ +#define PMf_MULTILINE (1 << 0) /* /m */ +#define RXf_PMf_SINGLELINE (1 << 1) /* /s */ +#define PMf_SINGLELINE (1 << 1) /* /s */ +#define RXf_PMf_FOLD (1 << 2) /* /i */ +#define PMf_FOLD (1 << 2) /* /i */ +#define RXf_PMf_EXTENDED (1 << 3) /* /x */ +#define PMf_EXTENDED (1 << 3) /* /x */ +#define RXf_PMf_KEEPCOPY (1 << 4) /* /p */ +#define PMf_KEEPCOPY (1 << 4) /* /p */ +#define RXf_PMf_LOCALE (1 << 5) +#define PMf_LOCALE (1 << 5) + +#define _RXf_PMf_SHIFT 5 /* Begins with '_' so won't be exported by B */ diff --git a/regcomp.pl b/regcomp.pl index aa0f0fe..d85482c 100644 --- a/regcomp.pl +++ b/regcomp.pl @@ -256,36 +256,39 @@ EXTCONST char * PL_reg_extflags_name[]; EXTCONST char * const PL_reg_extflags_name[] = { EOP -open my $fh,"<","regexp.h" or die "Can't read regexp.h: $!"; my %rxfv; my %definitions; # Remember what the symbol definitions are my $val = 0; my %reverse; -while (<$fh>) { - - # optional leading '_'. Return symbol in $1, and strip it from - # rest of line - if (s/ \#define \s+ ( _? RXf_ \w+ ) \s+ //xi) { - chomp; - my $define = $1; - s: / \s* \* .*? \* \s* / : :x; # Replace comments by a blank - - # Replace any prior defined symbols by their values - foreach my $key (keys %definitions) { - s/\b$key\b/$definitions{$key}/g; - } - my $newval = eval $_; # Get numeric definition +foreach my $file ("op_reg_common.h", "regexp.h") { + open my $fh,"<", $file or die "Can't read $file: $!"; + while (<$fh>) { + + # optional leading '_'. Return symbol in $1, and strip it from + # rest of line + if (s/ \#define \s+ ( _? RXf_ \w+ ) \s+ //xi) { + chomp; + my $define = $1; + s: / \s* \* .*? \* \s* / : :x; # Replace comments by a blank + + # Replace any prior defined symbols by their values + foreach my $key (keys %definitions) { + s/\b$key\b/$definitions{$key}/g; + } + my $newval = eval $_; # Get numeric definition - $definitions{$define} = $newval; + $definitions{$define} = $newval; - if($val & $newval) { - die sprintf "Both $define and $reverse{$newval} use %08X", $newval; + next unless $_ =~ /<op_pmflags member during * compilation */ -#define RXf_PMf_MULTILINE 0x00000001 /* /m */ -#define RXf_PMf_SINGLELINE 0x00000002 /* /s */ -#define RXf_PMf_FOLD 0x00000004 /* /i */ -#define RXf_PMf_EXTENDED 0x00000008 /* /x */ -#define RXf_PMf_KEEPCOPY 0x00000010 /* /p */ -#define RXf_PMf_LOCALE 0x00000020 /* use locale */ -/* these flags are transfered from the PMOP->op_pmflags member during compilation */ #define RXf_PMf_STD_PMMOD_SHIFT 0 #define RXf_PMf_STD_PMMOD (RXf_PMf_MULTILINE|RXf_PMf_SINGLELINE|RXf_PMf_FOLD|RXf_PMf_EXTENDED) #define RXf_PMf_COMPILETIME (RXf_PMf_MULTILINE|RXf_PMf_SINGLELINE|RXf_PMf_LOCALE|RXf_PMf_FOLD|RXf_PMf_EXTENDED|RXf_PMf_KEEPCOPY) @@ -297,53 +292,56 @@ and check for NULL. */ /* Anchor and GPOS related stuff */ -#define RXf_ANCH_BOL 0x00000100 -#define RXf_ANCH_MBOL 0x00000200 -#define RXf_ANCH_SBOL 0x00000400 -#define RXf_ANCH_GPOS 0x00000800 -#define RXf_GPOS_SEEN 0x00001000 -#define RXf_GPOS_FLOAT 0x00002000 +#define RXf_ANCH_BOL (1<<(_RXf_PMf_SHIFT+3)) +#define RXf_ANCH_MBOL (1<<(_RXf_PMf_SHIFT+4)) +#define RXf_ANCH_SBOL (1<<(_RXf_PMf_SHIFT+5)) +#define RXf_ANCH_GPOS (1<<(_RXf_PMf_SHIFT+6)) +#define RXf_GPOS_SEEN (1<<(_RXf_PMf_SHIFT+7)) +#define RXf_GPOS_FLOAT (1<<(_RXf_PMf_SHIFT+8)) /* two bits here */ #define RXf_ANCH (RXf_ANCH_BOL|RXf_ANCH_MBOL|RXf_ANCH_GPOS|RXf_ANCH_SBOL) #define RXf_GPOS_CHECK (RXf_GPOS_SEEN|RXf_ANCH_GPOS) #define RXf_ANCH_SINGLE (RXf_ANCH_SBOL|RXf_ANCH_GPOS) /* What we have seen */ -#define RXf_LOOKBEHIND_SEEN 0x00004000 -#define RXf_EVAL_SEEN 0x00008000 -#define RXf_CANY_SEEN 0x00010000 +#define RXf_LOOKBEHIND_SEEN (1<<(_RXf_PMf_SHIFT+9)) +#define RXf_EVAL_SEEN (1<<(_RXf_PMf_SHIFT+10)) +#define RXf_CANY_SEEN (1<<(_RXf_PMf_SHIFT+11)) /* Special */ -#define RXf_NOSCAN 0x00020000 -#define RXf_CHECK_ALL 0x00040000 +#define RXf_NOSCAN (1<<(_RXf_PMf_SHIFT+12)) +#define RXf_CHECK_ALL (1<<(_RXf_PMf_SHIFT+13)) /* UTF8 related */ -#define RXf_MATCH_UTF8 0x00100000 +#define RXf_MATCH_UTF8 (1<<(_RXf_PMf_SHIFT+15)) /* Intuit related */ -#define RXf_USE_INTUIT_NOML 0x00200000 -#define RXf_USE_INTUIT_ML 0x00400000 -#define RXf_INTUIT_TAIL 0x00800000 +#define RXf_USE_INTUIT_NOML (1<<(_RXf_PMf_SHIFT+16)) +#define RXf_USE_INTUIT_ML (1<<(_RXf_PMf_SHIFT+17)) +#define RXf_INTUIT_TAIL (1<<(_RXf_PMf_SHIFT+18)) /* Set in Perl_pmruntime if op_flags & OPf_SPECIAL, i.e. split. Will be used by regex engines to check whether they should set RXf_SKIPWHITE */ -#define RXf_SPLIT 0x01000000 +#define RXf_SPLIT (1<<(_RXf_PMf_SHIFT+19)) #define RXf_USE_INTUIT (RXf_USE_INTUIT_NOML|RXf_USE_INTUIT_ML) /* Copy and tainted info */ -#define RXf_COPY_DONE 0x02000000 -#define RXf_TAINTED_SEEN 0x04000000 -#define RXf_TAINTED 0x08000000 /* this pattern is tainted */ +#define RXf_COPY_DONE (1<<(_RXf_PMf_SHIFT+20)) +#define RXf_TAINTED_SEEN (1<<(_RXf_PMf_SHIFT+21)) +#define RXf_TAINTED (1<<(_RXf_PMf_SHIFT+22)) /* this pattern is tainted */ /* Flags indicating special patterns */ -#define RXf_START_ONLY 0x10000000 /* Pattern is /^/ */ -#define RXf_SKIPWHITE 0x20000000 /* Pattern is for a split / / */ -#define RXf_WHITE 0x40000000 /* Pattern is /\s+/ */ -#define RXf_NULL 0x80000000 /* Pattern is // */ +#define RXf_START_ONLY (1<<(_RXf_PMf_SHIFT+23)) /* Pattern is /^/ */ +#define RXf_SKIPWHITE (1<<(_RXf_PMf_SHIFT+24)) /* Pattern is for a split / / */ +#define RXf_WHITE (1<<(_RXf_PMf_SHIFT+25)) /* Pattern is /\s+/ */ +#define RXf_NULL (1<<(_RXf_PMf_SHIFT+26)) /* Pattern is // */ +#if _RXf_PMf_SHIFT+23 > 31 +# error Too many RXf_PMf bits used. See regnodes.h for any spare in middle +#endif /* * NOTE: if you modify any RXf flags you should run regen.pl or regcomp.pl -- 1.5.6.3 ```
p5pRT commented 14 years ago

From @rgs

Thanks\, applied to bleadperl. (I see that the only patch that needed some reviewing was the last one)

On 28 July 2010 18​:37\, karl williamson \public@&#8203;khwilliamson\.com wrote​:

This patch seems to be warnocked\, so I'm trying again\, changing the subject line to hopefully attract more interest.  I had held off submitting it late in 5.13.3 to early 5.13.4 at the release manager's request because I thought it would break binary compatibility.  But now we are well into the new cycle; I wouldn't want to miss this cycle too.  I ended up splitting the patch so that the part I'm currently submitting doesn't break binary compatibility; that part is yet to come.

This patch is the first step in trying to add new regex modifiers\, and then fix Unicode handling in regexes.

The commits still apply cleanly.  A rebased version is at github git​://github.com/khwilliamson/perl.git branch match_sw

Currently the structures in op.h and regexp.h share common values\, which manually must be kept in sync.  This patch creates a new header with those common values that is #included from the two affected header files.  There are commits for ancillary small bug fixes and added comments I did as I went along.

I'm not sure if this common header is the best way to approach the matter\, but it works.  I'm open to other ideas.

I left the two different symbols for each value because I wasn't sure what to do with them\, since they are both exported by B\, and I presume that means that external programs may be depending on them.  I see that B has some explicit definitions for backward compatibility\, so I'm thinking it would be better to remove the duplicate symbols in the core\, and then define them there for external programs.  But I don't know enough to know if that is reasonable or not.  Please advise me.

It was most convenient to change the definitions to use eg.\, 1\<\<2 instead of 0x4.  That meant I had to change the code that reads and evals those definitions to understand the left shift syntax.

I went out of the way to ensure binary compatibility with this patch. The actual bit structures haven't changed.  Once a version of this is installed\, it will be easy to move things around to make room for the new bits needed.

p5pRT commented 14 years ago

The RT System itself - Status changed from 'new' to 'open'

p5pRT commented 14 years ago

@rgs - Status changed from 'open' to 'resolved'