Kingwl / proposal-string-trim-characters

Proposal to add argument for .trim(), .trimStart() and .trimEnd() to allow strip the specified characters from strings.
MIT License
27 stars 1 forks source link

Provide some use cases #2

Open hax opened 3 years ago

hax commented 3 years ago

I like this proposal! As previous discussions, it seems the only question may be whether we have enough use cases. So here are some use cases I meet in the past programmer life.

// get markdown header text
let s = '### title ###'
s.trim('#').trim() // 'title'

`/* remove
 * multiline
 * comments
 * leading stars
 */`.split(/\n/).map(l => l.trimStart('* '))

// remove leading zeros
digits.trimStart('0') || '0'

// remove url trailing slashes
location.href.trimEnd('/') 

// remove the ending punctuations
line.trim().trimEnd('.!?')
jamiebuilds commented 3 years ago

I'm a bit concerned that strings like .trimStart("* ") would be confusing to developers.

It could be read in two different ways:

"* ***abc".trimStart("* ")
// developer reading this may expect either of the following:
"* ***abc".replace(/^(\* )*/, "") // "***abc"
"* ***abc".replace(/^[\* ]*/, "") // "abc"

For example, its common to use markdown within comments which could potentially include:

/**
 * - nested
 *   - markdown
 *     - list
 */ 

The user trimming "* " may expect this result:

innerLines = [
  "- nested",
  "  - markdown",
  "    - list",
]

But with .trim("* ") this is what they'd actually get:

innerLines = [
  "- nested",
  "- markdown",
  "- list",
]
SebastianSimon commented 3 years ago

Even in the first comment, there’s this snippet:

`/* remove
 * multiline
 * comments
 * leading stars
 */`.split(/\n/).map(l => l.trimStart('* '))

If we put in another .join("\n") at the end, according to the README, this would result in this:

`/* remove
multiline
comments
leading stars
/`

That’s an unclosed comment!


This sets a strange precedent in JS, where a String method accepts a string which is interpreted as a set of characters. No other String method does this.

Let’s explore some alternatives:

// Example: remove irrelevant stuff from end of URL.

// Accepting a string as a set of characters:
"https://example.com/?#".trimEnd("/?#");
"https://example.com/?#".trimEnd("#?/"); // These two examples are the same!

// Accepting an array instead of a string:
"https://example.com/?#".trimEnd([ "/", "?", "#" ]);
"https://example.com/?#".trimEnd([
  "/",
  "?",
  "#"
]);
"https://example.com/?#".trimEnd([ ..."/?#" ]);
"https://example.com/?#".trimEnd([
  ..."/?#"
]);
"https://example.com/?#".trimEnd(Array.from("/?#"));

// Accepting multiple arguments:
"https://example.com/?#".trimEnd("/", "?", "#");
"https://example.com/?#".trimEnd(..."/?#");

// Accepting a proper Set:
"https://example.com/?#".trimEnd(new Set([ "/", "?", "#" ]));
"https://example.com/?#".trimEnd(new Set([
  "/",
  "?",
  "#"
]));

I think the multiple-arguments example is even more confusing. Maybe accept an iterable in general? This would leave the current syntax as is, but extend the accepted argument to any iterable rather than only strings.

Kingwl commented 3 years ago

Hi folks. Thanks for the suggestions.

It's inspireable examples in this issue. I'd like to update the readme soon.

@SebastianSimon And also thanks for the API suggestions. I like the iterable argument. It's a stage-0 proposal and just seeking for stage-1. I suppose It's can be confirm after stage-1 (seeking stage-2). We may have more time.

Would you mind send PR to patch these changes? I'd love (and open) to hear folks' opinions.

Thanks again!

Kingwl commented 3 years ago

I'm a bit concerned that strings like .trimStart("* ") would be confusing to developers.

@jamiebuilds

Me too. But there's some reality here:

I thought there's no strongly reason to break the common behavior.

And to avoid the confusing:

And It's might also could discuss after stage-1 too (Correct me If I'm wrong).

Kingwl commented 3 years ago

By some shallow search: intext:"lodash.trim(" site:github.com ext:ts and intext:"_.trim(" site:github.com ext:ts (lodash.trim in TypeScript files. JavaScript files do not work with google).

I have found:

They may could provide some realworld scenarios.

And I have tried search in github or more complex pattern in google. But I'm failed. I'm not an expert of google or github search. Apologies for the rough results.

SebastianSimon commented 3 years ago

Would you mind send PR to patch these changes? I'd love (and open) to hear folks' opinions.

What should the PR change, exactly? We’re already discussing alternative API designs here, in the issues, so there’s no shortage of opinions.

jamiebuilds commented 3 years ago

I did a quick Twitter poll to as a quick check to see what people intuitively expected, and the result was somewhat overwhelming:

What would you expect the result of this (hypothetical) code to be:

"abaabc".trimStart("ab")
  • "aabc"(84% of 544 votes)
  • "c" (3%)
  • something else (5%)

https://twitter.com/buildsghost/status/1413215103428694017?s=21

It’s definitely not scientific, but I think it’s enough to seriously question this design even if it’s being used in the community already (that might even make a stronger case against it if people are confused by an API they are already using).

hax commented 3 years ago

This sets a strange precedent in JS, where a String method accepts a string which is interpreted as a set of characters. No other String method does this.

@SebastianSimon Yeah, it seems we don't have precedent in JS, but the point is it's the precedent of similar functions in other languages and libraries. (Note C# Trim() only take char, char[], not string, but unfortunately JS do not have char type.)

extend the accepted argument to any iterable rather than only strings

It still doesn't solve the problem what strings which are not one char means...

hax commented 3 years ago

@jamiebuilds We are facing the conflict with the average programmers who never use such APIs (so they expect TrimPrefix instead of TrimCharacters) VS. those who have knowledge with similar APIs in lodash/php/python/c# etc.

To be honest I don't know there is any perfect solution, we eventually need do some tradeoffs here.

jamiebuilds commented 3 years ago

those who have knowledge with similar APIs in lodash/php/python/c# etc.

Since this functionality does exist in a lot of other languages and libraries, I would expect more people to have a solid understanding of it.

But since its popular and people don't seem to understand it, I'm concerned that people are using this function incorrectly in other languages/libraries.

I agree that it would be super useful to have the functionality proposed here, but I don't think we should use a function signature that is actively confusing people

theScottyJam commented 3 years ago

I went ahead and opened a separate issue over here to discuss how we should handle multi-characters strings, so that we can keep this issue related to providing use cases.

Andrew-Cottrell commented 2 years ago

Use case: https://github.com/stevemao/trim-off-newlines (CVE-2021-23425) Use case: https://github.com/sindresorhus/trim-newlines (CVE-2021-33623)

jimmywarting commented 2 years ago

I was super confused when i read this example in the readme

'-_-abc-_-'.trimEnd('-_') // -_-abc

The string don't ends with -_ so it shouldn't trim the ending at all...?!

I came here looking for a solution that would help me deal with url pathnames and trying to normalize a url so it either don't start or ends with a slash.

so what would happen with 'example.com/foo/bar'.trim('/') ???


I only now just realized that the characters aren't a sequences... but rather a list of characters to look for in the string if this is the case then i would rather prefer if it was something like:

var chars = ['_', '-']
'-_-abc-_-'.trimEnd(chars)

// or
'-_-abc-_-'.trimEnd(...chars) 
// but in this case we limit ourself to adding new features such as other options
// eg; 'foo////'.trimEnd('/', 2) -> foo//

then it can also work with multicharacter

'Hi! 👌🏾'.trimEnd(['👌🏾'])

But i think i rather prefer to limit the api to only look for a sequences of character

'Hi! 👌🏾'.trimEnd('👌🏾') // 'Hi! '
'abc-_-'.trimEnd('-_') // 'abc-_-'
'abc-_'.trimEnd('-_') // 'abc'
'abc-_-_'.trimEnd('-_') // 'abc'

So that the underlying api is something like

while (str.endsWith(patter)) {
  str = str.slice(start, end)
}