dagolden / Path-Tiny

File path utility
41 stars 59 forks source link

Fix tilde expansion bug in FREEZE/THAW #264

Closed ap closed 1 year ago

ap commented 1 year ago

The new, much cleaner design for tilde expansion still leaves a trap for users who stringify a PT object and then later feed the result back to path():

$ perl -Ilib -MPath::Tiny -le 'print path path("./~") . ""'
/home/ap

The special case back in #191 was meant to fix not just issues during internal processing, it also helped with round-tripping paths that travel part of their way through user code as strings. In the split-up of path() in fe8583b68f7d76b763613c29fedce26675e696f7 this was dropped.

In my opinion it should be reinstated.

Of course in this new cleaner design, where the internal path is always literal, the responsibility for this escaping ought to be relocated. Just as tilde expansion was relocated to the edge – on the way in –, so tilde escaping ought to be relocated to the edge – on the way out.

In fact, there is one place where it must happen on the way out: the THAW method still calls path(), not _path() – as it has to! Because only this way can instances of previous versions of PT be thawed correctly. But currently the FREEZE method just spits out the raw internal path with no tilde escaping. So this pair of methods still has a tilde expansion bug! And for backcompat this cannot be fixed by changing THAW – therefore it must be fixed by changing FREEZE.

And IMO, this same benefit ought to be afforded to user code.

Therefore this PR adds tilde escaping to the stringify method; and then it changes the "" overload to 'stringify' as well as aliases FREEZE to stringify so as to keep things DRY.

ap commented 1 year ago

I didn’t deal with the tests yet. This will need new tests, as well as changes to the existing tests. But I wanted to get the PR out there before the current dev release goes out as stable. 🙂

xdg commented 1 year ago

Thanks. I generally wait a week between dev and stable releases to allow CPAN Testers to give some feedback.

xdg commented 1 year ago

(In addition to wanting tests, of course.)

ap commented 1 year ago

That’s all I can do for now. I have to attend to other things now and may not find time for P::T again for a day or two.

ap commented 1 year ago

So I threw together all the reasonable variants I could think of:

use strict; use warnings;
use Benchmark::Dumb 'cmpthese';
my $path = do {
    package MockPathTiny;
    sub stringify  { $_[0][0] }
    sub stringify2 { (my $esc = $_[0][0]) =~ s!^~!./~!; $esc }
    sub stringify3 { $_[0][0] =~ s!^~!./~!r }
    sub stringify4 { $_[0][0] =~ /^~/ ? "./$_[0][0]" : $_[0][0] }
    sub stringify5 { '~' eq substr($_[0][0], 0, 1) ? "./$_[0][0]" : $_[0][0] }
    bless ['~/bin'], __PACKAGE__;
};
cmpthese 0.0002, {
    base    => sub { $path->stringify },
    's///'  => sub { $path->stringify2 },
    's///r' => sub { $path->stringify3 },
    'm//?:' => sub { $path->stringify4 },
    'eq?:'  => sub { $path->stringify5 },
};

I’m honestly a little surprised:

           Rate/s Precision/s   s///  s///r   eq?:  m//?:   base
s///  4.61869e+06          94     -- -17.8% -39.5% -41.4% -72.1%
s///r 5.61623e+06         130  21.6%     -- -26.4% -28.8% -66.1%
eq?:  7.63389e+06         250  65.3%  35.9%     --  -3.2% -53.9%
m//?: 7.88799e+06         260  70.8%  40.4%   3.3%     -- -52.4%
base  1.65698e+07        5600 258.8% 195.0% 117.1% 110.1%     --

I didn’t expect the substr() and m// versions to be quite this close, and I was firmly expecting s///r to perform better. So in any case the bottom line is, with amusing precision, that the ternary versions run at half the speed of the current version of the method, the s///r variant runs at one third, and my original copy-and-substitute version runs at a quarter.

I checked, and the results more or less hold up with older perls and different configurations. (Would be surprising if they didn’t; no surprises to be seen, though.)

So yeah. It’s clear which version to go with.

xdg commented 1 year ago

I strongly suspect that $_[0][0] =~ /^~/ gets optimized into something that isn't a regex since it's an anchored string with a constant. I tried /^~\S/ and the performance falls off sharply. I also tried precompiling qr/^~/ and that has the worst performance of all! FWIW, on my machine, eq? and m//? consistently swap positions. Either is fine -- I think m//? is probably best as it's probably most optimized on the oldest perls.

ap commented 1 year ago

I strongly suspect that $_[0][0] =~ /^~/ gets optimized into something that isn't a regex since it's an anchored string with a constant.

It doesn’t, as a quick perl -MO=Concise -Mre=debug -e '$_[0][0] =~ /^~/' demonstrates. But the regex engine has had a lot of work done to spot and exploit opportunities for short-circuiting before it even starts the automaton. (Also, multideref is the name of DaveM’s op that I couldn’t remember exactly before.)

I also tried precompiling qr/^~/ and that has the worst performance of all!

Yeah, I learned that recently and it was a big surprise to me too. They get re-stringified and recompiled every single match – whereas interpolating strings into a regex has a caching mechanism. So qr is actually the slowest way to do regexes! Check out the following posts:

http://blogs.perl.org/users/tom_wyant/2022/08/match-anything-quickly.html http://blogs.perl.org/users/tom_wyant/2022/09/match-anything-quickly----revision-1.html http://blogs.perl.org/users/aristotle/2022/09/reinterpolate.html