Closed ap closed 1 year ago
I didn’t deal with the tests yet. This will need new tests, as well as changes to the existing tests. But I wanted to get the PR out there before the current dev release goes out as stable. 🙂
Thanks. I generally wait a week between dev and stable releases to allow CPAN Testers to give some feedback.
(In addition to wanting tests, of course.)
That’s all I can do for now. I have to attend to other things now and may not find time for P::T again for a day or two.
So I threw together all the reasonable variants I could think of:
use strict; use warnings;
use Benchmark::Dumb 'cmpthese';
my $path = do {
package MockPathTiny;
sub stringify { $_[0][0] }
sub stringify2 { (my $esc = $_[0][0]) =~ s!^~!./~!; $esc }
sub stringify3 { $_[0][0] =~ s!^~!./~!r }
sub stringify4 { $_[0][0] =~ /^~/ ? "./$_[0][0]" : $_[0][0] }
sub stringify5 { '~' eq substr($_[0][0], 0, 1) ? "./$_[0][0]" : $_[0][0] }
bless ['~/bin'], __PACKAGE__;
};
cmpthese 0.0002, {
base => sub { $path->stringify },
's///' => sub { $path->stringify2 },
's///r' => sub { $path->stringify3 },
'm//?:' => sub { $path->stringify4 },
'eq?:' => sub { $path->stringify5 },
};
I’m honestly a little surprised:
Rate/s Precision/s s/// s///r eq?: m//?: base
s/// 4.61869e+06 94 -- -17.8% -39.5% -41.4% -72.1%
s///r 5.61623e+06 130 21.6% -- -26.4% -28.8% -66.1%
eq?: 7.63389e+06 250 65.3% 35.9% -- -3.2% -53.9%
m//?: 7.88799e+06 260 70.8% 40.4% 3.3% -- -52.4%
base 1.65698e+07 5600 258.8% 195.0% 117.1% 110.1% --
I didn’t expect the substr()
and m//
versions to be quite this close, and I was firmly expecting s///r
to perform better. So in any case the bottom line is, with amusing precision, that the ternary versions run at half the speed of the current version of the method, the s///r
variant runs at one third, and my original copy-and-substitute version runs at a quarter.
I checked, and the results more or less hold up with older perls and different configurations. (Would be surprising if they didn’t; no surprises to be seen, though.)
So yeah. It’s clear which version to go with.
I strongly suspect that $_[0][0] =~ /^~/
gets optimized into something that isn't a regex since it's an anchored string with a constant. I tried /^~\S/
and the performance falls off sharply. I also tried precompiling qr/^~/
and that has the worst performance of all! FWIW, on my machine, eq?
and m//?
consistently swap positions. Either is fine -- I think m//?
is probably best as it's probably most optimized on the oldest perls.
I strongly suspect that
$_[0][0] =~ /^~/
gets optimized into something that isn't a regex since it's an anchored string with a constant.
It doesn’t, as a quick perl -MO=Concise -Mre=debug -e '$_[0][0] =~ /^~/'
demonstrates. But the regex engine has had a lot of work done to spot and exploit opportunities for short-circuiting before it even starts the automaton. (Also, multideref
is the name of DaveM’s op that I couldn’t remember exactly before.)
I also tried precompiling
qr/^~/
and that has the worst performance of all!
Yeah, I learned that recently and it was a big surprise to me too. They get re-stringified and recompiled every single match – whereas interpolating strings into a regex has a caching mechanism. So qr
is actually the slowest way to do regexes! Check out the following posts:
http://blogs.perl.org/users/tom_wyant/2022/08/match-anything-quickly.html http://blogs.perl.org/users/tom_wyant/2022/09/match-anything-quickly----revision-1.html http://blogs.perl.org/users/aristotle/2022/09/reinterpolate.html
The new, much cleaner design for tilde expansion still leaves a trap for users who stringify a PT object and then later feed the result back to
path()
:The special case back in #191 was meant to fix not just issues during internal processing, it also helped with round-tripping paths that travel part of their way through user code as strings. In the split-up of
path()
in fe8583b68f7d76b763613c29fedce26675e696f7 this was dropped.In my opinion it should be reinstated.
Of course in this new cleaner design, where the internal path is always literal, the responsibility for this escaping ought to be relocated. Just as tilde expansion was relocated to the edge – on the way in –, so tilde escaping ought to be relocated to the edge – on the way out.
In fact, there is one place where it must happen on the way out: the
THAW
method still callspath()
, not_path()
– as it has to! Because only this way can instances of previous versions of PT be thawed correctly. But currently theFREEZE
method just spits out the raw internal path with no tilde escaping. So this pair of methods still has a tilde expansion bug! And for backcompat this cannot be fixed by changingTHAW
– therefore it must be fixed by changingFREEZE
.And IMO, this same benefit ought to be afforded to user code.
Therefore this PR adds tilde escaping to the
stringify
method; and then it changes the""
overload to'stringify'
as well as aliasesFREEZE
tostringify
so as to keep things DRY.