Closed grr closed 2 years ago
Thanks for the suggestion. This does fit the mission of Path::Tiny -- making it easy to do things the "right way".
path($one)->compare($two)
Path:Tiny can do a raw open on the two files and provide the handles to File::Compare. And it can default to a 64k buffer size, which appears recommended as a good default for I/O.
I'm not inclined to support the compare_text
method because dealing with file encodings leads to even more API bloat and is already a specialized case. Users can open the two file handles directly and iterate them easily enough.
Note to self: compare
isn't a great verb for this. Many (but not all) boolean methods start with is
. Maybe compares_equal
?
@ap -- you've got good perspective on APIs -- what do you think about naming?
My first thought was “what’s wrong with compare
?”
I don’t feel that compares_equal
improves on it.
And I couldn’t immediately think of anything else that does.
OTOH, taking a step back, I don’t actually disagree that compare
is a bit meh.
Indeed, on second thought, it occurs to me that “compares” and “equal” both say the same thing, and that’s the problem, no? It’s just meh twice. You want to say what is equal. Which would be the contents, which means you are looking for something like contents_equal
.
My only misgiving with that particular suggestion is that it completely loses any connection to File::Compare as the underlying module.
(FWIW I agree with not supporting compare_text
.)
which means you are looking for something like contents_equal.
My only misgiving with that particular suggestion is that it completely loses any connection to File::Compare as the underlying module.
It sounds like we're already assuming the return value will be reversed (unless we name it contentsdiffer). Looking through Path::Tiny's docs, there are a number of tests that are named `is(
is_absolute,
is_relative,
is_rootdir), beside the file test operators, so why not something like
$a->is_same_content($b)`?
The docs should also probably mention that File::Compare does not check if the underlying files are the same (it will happily take the same file as both inputs), so the user might still want to compare with realpath(). And there doesn't appear to be a portable way to check if a file is a hard-link of another: https://www.perlmonks.org/?node_id=859658
All the existing is_foo
methods are simple predicates about the invocant itself. This new method would be an odd one out among the set, so I don’t know that that’s a good place to locate it.
Just linguistically is_same_content
is nails on a chalkboard though. It would have have to be called has_same_content
anyway, or else something like is_identical_in_content
.
This all gets rather cumbersome IMO.
It sounds like we're already assuming the return value will be reversed
That’s a good point. I don’t know that I’m happy with that.
Annoyingly, File::Compare
has 0 for unequal, 1 for equal and -1 for error which means it doesn't actually work like cmp
, which would make a name like cmp_content
descriptive and intuitive. (Could reimplement to do those semantics without File::Compare...)
has_equal_content
?The docs should also probably mention that File::Compare does not check if the underlying files are the same (it will happily take the same file as both inputs), so the user might still want to compare with realpath(). And there doesn't appear to be a portable way to check if a file is a hard-link of another: https://www.perlmonks.org/?node_id=859658
The unix cmp
command (which is part of the POSIX spec, so is available on most systems) does both of these checks: http://git.savannah.gnu.org/cgit/diffutils.git/tree/src/cmp.c#n284
And it's exit value is similar to to File::Compare::compare's return values: Exit status is 0 if inputs are the same, 1 if different, 2 if trouble
.
In the spirit of "write the docs first", here's what I propose. I think it's hard to misunderstand or misuse.
=method has_same_bytes
if ( path("foo.txt")->has_same_bytes("bar.txt") ) {
# ...
}
This method returns true if both the invocant and the argument can be opened as
file handles and the handles contain the same bytes. It returns false if their
contents differ. If either can't be opened as a file (e.g. a directory or
non-existent file), the method throws an exception. If both can be opened and
both have the same C<realpath>, the method returns true without scanning any
data.
=cut
Note: there is a subtlety with realpath
where a realpath to a non-existent leaf is possible, which is why both must be opened before the realpath
check.
LGTM
Just linguistically is_same_content is nails on a chalkboard though. It would have have to be called has_same_content anyway, or else something like is_identical_in_content.
I Just realized why i was having the opposite feeling. has
implies a membership test, whereas is
implies an identity test. And both of those words are frequently used in Perl with those contexts: has
with Moo/Moose (though it's used to install a member into a set, instead of testing for membership), and is
with the isa
op.
I often have to compare the contents pointed to by the paths. I had been using the digest method, which works fine for the purpose, but I just benchmarked against File::Compare and the latter is 15-30x faster, so I'm definitely switching to using compare.
According to
corelist
: File::Compare was first released with perl 5.004, so it would not add an additional dependency.