Closed p5pRT closed 14 years ago
The attached series of commits changes the definitions of the character class macros in handy.h to use table lookup for all those that may require more than one comparison (leaving just isASCII() as not table lookup).
Variants of each one are added\, with the ones whose names end in _A mean that they match only in the ASCII range\, and those ending in _L1 match in the entire Latin1 range.
The first two commits are repeats of those in [perl #78022] PATCH: Add a couple of macros to handy.h. The whole series is available at git://github.com/khwilliamson/perl.git branch autodoc.
The effect of this patch is both performance and extending capabilities. Now all calls to these are O(1)\, and there are macros available to use for the /u and proposed /a or similar regular expression modifiers. Previously the macros could have executed many branches to classify the input. Finding an ASCII word character could take 7 comparisons\, quite a few more for Latin1.
Hi Karl\,
karl williamson wrote:
# New Ticket Created by karl williamson # Please include the string: [perl #78024] # in the subject line of all future correspondence about this issue. # \<URL: http://rt.perl.org/rt3/Ticket/Display.html?id=78024 >
I just applied your changes locally after some review (as far as I'm qualified to do that) and squashed the "add table" and "oops\, forgot generating script" commits together.
The tests pass. I pushed the changes to camel as a branch "steffen/asciiTableLookup". The reason I didn't push to blead was that I'm not sure I'd put the big generated table in perl.h (at least not without a note that it's generated). I'd like to hear from others before I push this to blead.
Cheers\, Steffen
The RT System itself - Status changed from 'new' to 'open'
Hi again\,
karl williamson wrote:
# New Ticket Created by karl williamson # Please include the string: [perl #78024] # in the subject line of all future correspondence about this issue. # \<URL: http://rt.perl.org/rt3/Ticket/Display.html?id=78024 >
Just after writing the previous mail\, I got some feedback via IRC from Nicholas. It seems he agrees on my sentiment that the new table shouldn't live in perl.h. We also agree that your general plan wrt. table lookups is a very good one.
What we'd like to see for consistency (not a must) is:
- generated table moved to its own header (mentioning that it's generated). - The script isn't really in the "Porting" category. Apart from the fact that Porting/ is already a pretty wild place\, I think it's for actual developer tools like the ones for validating things (check*.pl)\, preparing releases and such. Most certainly not (see next point) for stuff that's run by regen.pl. I'm saying this without any authority\, mind you. Maybe the script should live in the top level? - It would be convenient and reasonable to run it with each "regen.pl"
What do you think? I'd be willing to attempt the necessary munging of your changes\, but they're your work\, so I don't want to trample all over it without your consent.
Best regards\, Steffen
@cpansprout - Status changed from 'open' to 'resolved'
karl williamson wrote:
The attached series of commits changes the definitions of the character class macros in handy.h to use table lookup for all those that may require more than one comparison
Have you profiled this? Lookup tables aren't the straightforward win that they were in the 1970s. The comparison isn't just N comparison instructions against one table-lookup instruction any more. Now you must consider the cache space taken up by the table against the cache space taken up by the comparisons. L1 cache misses are expensive\, and can easily wipe out the performance win from using fewer instructions. We might conceivably want to have both styles of implementation\, and choose between them depending on the target machine's cache architecture.
-zefram
Migrated from rt.perl.org#78024 (status was 'resolved')
Searchable as RT78024$