Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.85k stars 527 forks source link

Request to add a trim() function to Core #17999

Closed scottchiefbaker closed 2 years ago

scottchiefbaker commented 3 years ago

This should address #17952

Many thanks to Grinnz and LeoNerd for all their help on this pet peeve of mine.

leonerd commented 3 years ago

@khwilliamson I'd normally be a good person to review this but given as I wrote most of it, that might be somewhat odd ;)

atoomic commented 3 years ago

@scottchiefbaker please rebase to resolve the conflicts

scottchiefbaker commented 3 years ago

@atoomic I worked with @leonerd and got the branch rebased, and added tests for taint mode.

xenu commented 3 years ago

There are some obviously unrelated Pod::HTML changes in this branch which need to be removed. Also, it really should be squashed.

khwilliamson commented 3 years ago

I don't grok how there seem to get two threads about a single PR. In some other thread on this, omeone suggested this shouldn't go in core. I think it should. It is a commonly used paradigm that is easy to get wrong. I saw on IRC the stumbles, involved in the process. That tells me there is a need for an encapsulated way to do it.

I offer as further evidence the Unix sh script 'overwrite' This is a simple shell script that is generally called as overwrite foo command foo where foo is some command. It is typically used in an xargs to easily apply a change across potentially many files. Someone came up with the idea for this, the name Rob Pike comes to mind, but it was never accepted into the Unix command repertory. The reason given was that it was a developer's functionality, and a developer could simply write their own and keep it in their tool box. I did, and I still use it, as recently as yesterday, but as a result it became tribal knowledge, and this paradigm just didn't occur to people who weren't lucky enough to have been exposed to it. The result is a lot of unnecessary time used up to perform this task, and all because of a purist minimalist approach.

leonerd commented 3 years ago

What's overall status here? Still in need of a rebase, but aside from that has it been discussed and ready to merge yet?

khwilliamson commented 3 years ago

On 8/19/20 2:17 PM, Paul Evans wrote:

What's overall status here? Still in need of a rebase, but aside from that has it been discussed and ready to merge yet?

If no one objects soon, after rebasing, we should merge it IMO

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Perl/perl5/pull/17999#issuecomment-676644503, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAA2DHYR6CY3JRKYBLA4W2TSBQXPJANCNFSM4PJKSLAQ.

tonycoz commented 3 years ago

What's overall status here? Still in need of a rebase, but aside from that has it been discussed and ready to merge yet?

It's broken. It depends on a trim feature macro which doesn't exist.

None of the patches update feature.pl nor the files it generates.

khwilliamson commented 3 years ago

On 11/4/20 5:52 PM, Tony Cook wrote:

What's overall status here? Still in need of a rebase, but aside
from that has it been discussed and ready to merge yet?

It's broken. It depends on a trim feature macro which doesn't exist.

None of the patches update feature.pl nor the files it generates.

I discussed this with several people several months ago. We didn't think this feature should go in core. But it is desirable to have this functionality available, and it should go in some Util module.

xenu commented 3 years ago

It's broken. It depends on a trim feature macro which doesn't exist.

None of the patches update feature.pl nor the files it generates.

Apparently the feature.pl change was lost during the rebase, it used to be there.

dur-randir commented 3 years ago

I'd expect that calling 'trim' in a void context would behave like 'chop'/'chomp', modifying the passed string, without having to write

$foo = trim($foo);
scottchiefbaker commented 3 years ago

I do not have the technical expertise to fix this, or take this feature any further. I'm not a Perl core developer, nor do I know much C. I only wrote the tests, and the docs.

Someone with more expertise in this area needs to take this PR over.

Grinnz commented 3 years ago

I'd expect that calling 'trim' in a void context would behave like 'chop'/'chomp', modifying the passed string, without having to write

$foo = trim($foo);

This has already been discussed and it is not intended to operate in place. chomp's nature makes it difficult to use in larger statements and this should not be repeated here. Doing it based on context would be an even bigger mistake, since it is quite easy to accidentally be in a context you did not expect.

Leont commented 3 years ago

I'd expect that calling 'trim' in a void context would behave like 'chop'/'chomp', modifying the passed string, without having to write

The modifying style favors whipuptitude, and the functional style favors manipulexity. Prefering either says a lot about what you use perl for.

This distinction is a common tension point in perl language design

aaronpriven commented 3 years ago

I've been thinking this through. I'm actually on record saying that Text::Trim from CPAN is great precisely because it mutates in void context and returns in non-void context, and works on the argument list if there is one or $ if there isn't. (It didn't work with lexical `$`, but that's long gone now. Yay.)

I like the idea of the core feature being a pure drop-in for Text::Trim. Also, while I think the difference between list and scalar context is indeed confusing (and arguably a misfeature of Perl), the difference between void and non-void contexts is usually pretty obvious from the code.

But if core trim did work this way, it would be the only core function that did. I think chomp and chop have very weird syntaxes that should not be duplicated, and other than that the most analogous functions are the case-changing functions and scalar reverse, none of which mutate their arguments. And having trim be the odd one out seems like a bad idea.

So overall, I think I agree with the proposal.

Grinnz commented 3 years ago

The difference between void and non-void contexts is not obvious. Most commonly, the context of the last statement executed in a subroutine will be in an ambiguous context; and it's not even obvious which statement will be the last one in any particular execution of the subroutine. I strongly recommend not relying on this.

aaronpriven commented 3 years ago

I take your point.

tonycoz commented 3 years ago

I do not have the technical expertise to fix this, or take this feature any further. I'm not a Perl core developer, nor do I know much C. I only wrote the tests, and the docs.

Someone with more expertise in this area needs to take this PR over.

Adding the trim feature.pm feature isn't difficult (add one line and make regen) and is done in https://github.com/Perl/perl5/tree/tonyc/trim-with-feature which fixes the compilation error.

Several other tests fail though, I don't think any of them require any C knowledge.

scottchiefbaker commented 3 years ago

After much banging my head against it... this branch is updated and rebased. All the tests are passing and this is ready to be merged.

tonycoz commented 3 years ago

After much banging my head against it... this branch is updated and rebased. All the tests are passing and this is ready to be merged.

This isn't rebased. "This branch cannot be rebased due to conflicts"

It looks like your merged blead into your branch in a couple of places, rather than rebasing on blead.

scottchiefbaker commented 3 years ago

@tonycoz good catch... I did a real rebase and we should be good now.

karenetheridge commented 3 years ago

Due to @leonerd 's recent changes to opcode.h and opnames.h, there are now conflicts which should be rebased manually.

leonerd commented 3 years ago

Due to @leonerd 's recent changes to opcode.h and opnames.h, there are now conflicts which should be rebased manually.

Those are generated files. Manually resolve regen/opcodes and then run

$ perl regen/opcode.pl
scottchiefbaker commented 3 years ago

@karenetheridge this has been rebased an updated so it's mergeable again.

The chatter on p5p is making this look more and more like it won't be merged. Not sure how much more time/work it's worth to maintain this PR.

wchristian commented 3 years ago

The chatter on p5p is making this look more and more like it won't be merged. Not sure how much more time/work it's worth to maintain this PR.

No need to be pessimistic about that yet. Things have been slow because the PSC has been piecing itself together, and the only person raising any strong objections based in any arguments beyond "i don't like it", has not yet created a useful reason to reject, and the chatter is more about trying to figure out his motivations and how to make him happy, because it's looking more and more like all his technical concerns can be mollified with CORE::trim as is.

rjbs commented 3 years ago

Hello! I am here wearing my "Perl Steering Councilmember" hat.

This feature has been waiting in limbo, and we regret that, and we want to make sure that you know we want it in Perl!

The behavior as implemented is not what we would want to merge. We think the feature is designed to make sense in isolation, but that in the context of Perl as it stands, it's a bit too unlike its nearest neighbors. The short version here is "trim should work like chomp". Neil wrote this piece of documentation, which we all agreed on entirely:

    trim VARIABLE
    trim( LIST )
    trim
        This removes leading and trailing whitespace from a string,
        and returns the total number of characters trimmed.

        If VARIABLE is omitted, it trims $_.

        The definition of "whitespace" is whatever \s matches.
        You can actually trim anything that's an lvalue, including
        an assignment:

            trim(my $username = <STDIN>);

        If you trim a list, each element is trimmed. The return
        value is the sum of the individual trims.

        See also "chop" and "chomp".

(This isn't the documentation we'd actually write, but it hits the relevant points.)

An implementation that behaves this way can be shipped as part of the language. It will need to start life with a feature guard, but would not need to be experimental.

genio commented 3 years ago

no, please. trim shouldn't be like chomp. that's terrible in userland

genio commented 3 years ago

That would break with the way every single implementation of it thus far has been made. It would make it so we're fighting how every developer thinks it should work and there'd be no way to make every implementation of it out in the wild back-compatible to this.

genio commented 3 years ago

I'd honestly vote to not even bother with including trim if it's to behave in that manner. It'd be easier to just tell people to implement their own than to explain the odd behavior.

xenu commented 3 years ago

chomp is terrible, copying its behavior is a mistake. I don't think consistency with misfeatures is desirable.

Grinnz commented 3 years ago

I'd like to try to explain why this stance is confusing to me, though I don't have perhaps as strong an opinion. First as @genio pointed out, every implementation of trim in the wild (that I'm aware of) is immutable and returns the trimmed result, displaying that this is what is expected and desired of such a function. Second, the in-place behavior of chomp is surprising and incredibly odd to explain to newcomers, but this hurdle can be seen as acceptable for the benefit it gives in its extremely common use case of readline oneliners. trim's most common use cases are largely not overlapping with this, and tend to be part of larger and more complex string manipulation logic, where the mutation of the input argument would commonly be a source of bugs and the requirement to copy it before trimming is just awkward, akin to the use of s/// within a chain of string processing before we had the /r flag.

Grinnz commented 3 years ago

every implementation of trim in the wild (that I'm aware of) is immutable and returns the trimmed result, displaying that this is what is expected and desired of such a function.

It seems this isn't true, at least two such implementations on CPAN operate in place only if called in void context. Though I would be adamantly against context sensitivity in this case.

aaronpriven commented 3 years ago

It seems to me that consistency with three-argument substr would make more sense that consistency with chomp -- which itself is inconsistent with chop.

scottchiefbaker commented 3 years ago

As the original author or this PR I have some concerns related to the proposed implementation. Most of the implementation details were hashed out in #17952, and while not set in stone I think we had a pretty good handle on real world use cases. We had input from 12 different people on that issue as well as feedback on P5P. The proposal as it stands is requesting the following changes:

  1. operating on arrays
  2. operating on $_ in-place
  3. Returning number of chars instead of a string

Operating on arrays was not something we ever discussed in the original implementation. I don't feel that it's entirely necessary for this feature to do that as most other languages implement trim() on a string basis only. This isn't a deal-breaker for me, but it feels like an extra nicety when we purposely kept the implementation simple so as not to complicate the feature or confuse people.

Operating on $_ in-place was not originally designed because we were focussed on the "data in / data out" method and copying the flow of substr() which does not operate in-place. I would be OK adding this if it's something the PSC requires, but it's not a requirement for me.

Returning the number of chars instead of a string is a deal-breaker for me. See the many comments above regarding why this is not the best way to move forward. I feel very strongly that trim() should return the trimmed string for the following reasons:

  1. This is how most other languages implement trim() (Javascript, PHP, Ruby, etc)
  2. Most CPAN modules and users implement trim() in this fashion
  3. chop() and chomp() are not analogous to this feature, and shouldn't serve as a template for us
  4. Counting the numbers of chars removed would be semi difficult and wouldn't provide good data to the end user anyway
  5. It would break simple things like:
@trimmed = map { trim($_); } @orig; 

#or

while (my $line = trim(<$INPUT>)) ...

I respectfully request the PSC re-evaluate their requirements and provide guidance moving forward.

hvds commented 3 years ago

The definition of "whitespace" is whatever \s matches.

@rjbs That appears ambiguous: the answer depends on your regexp flags.

FWIW, instead of a new keyword I'd rather see a perlfunc entry for trim that shows how to do it with a regexp; I can see that nobody else appears to think that, though.

Grinnz commented 3 years ago

FWIW, instead of a new keyword I'd rather see a perlfunc entry for trim that shows how to do it with a regexp; I can see that nobody else appears to think that, though.

This wouldn't be appropriate for perlfunc, but already exists in perlfaq: https://perldoc.perl.org/perlfaq4#How-do-I-strip-blank-space-from-the-beginning%2Fend-of-a-string%3F

wchristian commented 3 years ago

@hvds

FWIW, instead of a new keyword I'd rather see a perlfunc entry for trim that shows how to do it with a regexp; I can see that nobody else appears to think that, though.

There was plenty discussion around that on p5p and upthread and in linked tickets, please refer to those and read them all before opening up old topics. :)

Leont commented 3 years ago

Hello! I am here wearing my "Perl Steering Councilmember" hat.

TBH this post rather feels like SC and the rest of p5p are living on different islands. Something is going terribly wrong if SC decisions catch the rest of us off-guard.

wchristian commented 3 years ago

@rjbs @neilb @xsawyerx

it's a bit too unlike its nearest neighbors

While i have my bias, i also have a habit of trying to dig with "why?" until i can't reasonably dig further without hitting an axiom or a recursion.

The claim above is how deep i can get in the post when asking why. However it hits neither an axiom nor a recursion.

As such, a question:

Bonus question:

hvds commented 3 years ago

FWIW, instead of a new keyword I'd rather see a perlfunc entry for trim that shows how to do it with a regexp; I can see that nobody else appears to think that, though.

This wouldn't be appropriate for perlfunc, but already exists in perlfaq: https://perldoc.perl.org/perlfaq4#How-do-I-strip-blank-space-from-the-beginning%2Fend-of-a-string%3F

There are other things in perlfunc that are not formally appropriate there, I don't see that as a problem.

More generally, anything that really is a frequently asked question should not be relegated solely to the FAQ.

hvds commented 3 years ago

@hvds

FWIW, instead of a new keyword I'd rather see a perlfunc entry for trim that shows how to do it with a regexp; I can see that nobody else appears to think that, though.

There was plenty discussion around that on p5p and upthread and in linked tickets, please refer to those and read them all before opening up old topics. :)

@wchristian I did read them all, as a result of which I am already hugely familiar with your opinion. This, however, is my opinion.

wchristian commented 3 years ago

FWIW, instead of a new keyword I'd rather see a perlfunc entry for trim that shows how to do it with a regexp; I can see that nobody else appears to think that, though.

There was plenty discussion around that on p5p and upthread and in linked tickets, please refer to those and read them all before opening up old topics. :)

@wchristian I did read them all, as a result of which I am already hugely familiar with your opinion. This, however, is my opinion.

In that case, i'd recommend to, instead of just stating your opinion as a claim, use the knowledge you have to state why you hold that opinion so people can actually see whether they'd agree or disagree with your logic. :)

rwp0 commented 3 years ago

how about having two functions exported by the feature: trim and trimp (yes, that sounds a bit funny), or going in the other direction: strim similar to sprintf

(just a weak suggestion expressed)


also, creating a trim pragma in addition to the feature:

use trim 'copy'; # non-mutable
use trim 'in-place'; # mutable
use trim; # default: in-place

seems to be a more Perl-ish practice to modify behavior of built-in functions (as with open and sort pragmas) as a means for the resolution of conflicting user opinions and demands

neilb commented 3 years ago

Let me start by saying that the definition of trim is not a hill I'm prepared to die on :-)

If I were asked to create a trim library function, then I'd probably create $trimmed = trim($raw), partly because that's the first thing I thought of, and it's easy to write. CPAN is full of modules that were the author's first thought, and not necessarily the best way to do that thing ("scratch the itch and move on"). We're all guilty of that.

But here we're adding a keyword to a language, so there are different considerations. I know lots of other languages have a trim function, and that most (but not all) have it as a non-mutating function, rather than in-place edit.

The C++ standard library doesn't include trim, partly because they couldn't agree on how it should work[*] :-) And the Boost library decided to provide both: trim(), which does in-place, and trim_copy(). No, I'm not suggesting we should add two trims.

We're adding it to Perl though, so it should be a Perlish trim, and should be consistent with the language.

My first thought on trim, was what the standard "processing stdin loop" would look like, and I assumed this:

while (<>) {
    trim;
    ...
}

There are a lot of tutorials out there with the chomp version of this skeleton. It would look weird to see:

while (<>) {
    $_ = trim $_;

The point of $_ being that it's implied, and you don't need to write it. Yes, it's idiomatic, but it's Perl.

Particularly with respect to regexes, one of the reasons I prefer Perl for processing text, is that it feels sub-optimal in other languages to write something like:

string = replace(string, ...);

I also assumed that I would be able to write the following:

trim(my $foo = <STDIN>);

I've written similar lines with chomp for years, so I would assume I could do the same with trim, and I think a lot of people would assume that. Googling and looking on MetaCPAN, I find a lot of code like that, but I accept that doesn't necessarily mean that the majority of people would assume the same thing of trim.

I looked through my code, to see what I'm doing, and there's a lot of:

s/^\s+|\s+$//g;             =>   trim;
$foo =~ s/^\s+|\s+$//g;     =>   trim $foo;

I do also have a cleanse() function, which does more than trim.

We're not just adding the trim that the p5p developers want for their day-to-day coding, we've got to add what we think is the best trim for everyone coding in Perl, and also for people who learn Perl in the future.

Thinking about "consistent with the existing language", we've got:

There's no single model, so where does trim fit? Personally, it more strongly binds with chomp and s///, but that's because of how I think about them.

I'm also aware of the "don't mutate shit" functional school of thought. Furthermore, if we're hoping to attract new blood to Perl from other languages, then they might also assume that trim() would be non-mutating.

Of course, we could have trim work either way, depending on context, but I'm pretty sure we all think that's a bad idea.

[*] my partner is a C++ developer. She recently attended a course given by someone who's on the C++ standards committee. While talking about the standardisation process, he used trim as an example of things that didn't make it in because the committee couldn't agree on a single definition.

wchristian commented 3 years ago

Thanks for answering the why a bit more clearly. I'm not sure i agree, but it's a good explanation.

I haven't completed my thoughts on this fully and i look forward to seeing what others think.

That said, my initial thoughts are:

dur-randir commented 3 years ago

personally i consider any implementation that does not allow my @b = map trim_somehow, @c;

It can be done reasonably well with mutating trim, just like with any other mutators:

map {trim} (my @b = @c);
wchristian commented 3 years ago

@dur-randir

map {trim} (my @b = @c);

This only works if trim is the last step. The key point of map is composability, aka the ability to go:

my @whatever = map process, grep ..., map ..., grep ..., @asd;

This for example does not work:

grep /b$/, map chomp, (my @b = @c);

Neither this:

my @d = grep /b$/, map chomp, (my @b = @c);

Further, you can compose a grep before the map step, but then it looks like this. And, uh, good luck explaining that to a newbie.

map chomp, (my @b = grep /^a/, @c);

E: To add this as an example, here's why chomp is objectively bad imo.

my @b = grep /b$/, map { chomp( my $c = $_ ); $c } @c;
Grinnz commented 3 years ago

personally i consider any implementation that does not allow my @b = map trim_somehow, @c;

It can be done reasonably well with mutating trim, just like with any other mutators:

map {trim} (my @b = @c);

This is an antipattern and should be written trim for my @b = @c;

Which of course cannot compose as @wchristian explained.

leonerd commented 3 years ago

If we are considering two names, one an inplace mutating verb and one a regular value-returning function, I would suggest using imperative verb. vs past-participle forms; i.e.

trim $scalar;  # inplace
$result = trimmed " input here" ;

Inspired by Python's inplace sort() method vs. sorted() function.

leonerd commented 3 years ago

P.S. at that point it'd be really nice to have a chomped as well ;)