More string trimming capabilities

richard-viney commented 2 months ago

The existing string.trim, string.trim_left and string.trim_right functions currently only trim whitespace.

Could we add equivalent trimming functions that take the codepoints to be trimmed as an input, similar to Erlang's string.trim/3?

E.g.

import gleam/list

/// Removes all occurrences of the specified codepoints from the start and
/// end of a `String`.
///
pub fn trim_codepoints(s: String, codepoints: List(UtfCodepoint)) -> String {
  string_trim(s, Both, codepoints_to_ints(codepoints))
}

/// Removes all occurrences of the specified codepoints from the start of a
/// `String`.
///
pub fn trim_codepoints_left(s: String, codepoints: List(UtfCodepoint)) -> String {
  string_trim(s, Leading, codepoints_to_ints(codepoints))
}

/// Removes all occurrences of the specified codepoints from the end of a
/// `String`.
///
pub fn trim_codepoints_right(
  s: String,
  codepoints: List(UtfCodepoint),
) -> String {
  string_trim(s, Trailing, codepoints_to_ints(codepoints))
}

fn codepoints_to_ints(codepoints: List(UtfCodepoint)) {
  codepoints
  |> list.map(string.utf_codepoint_to_int)
}

type Direction {
  Leading
  Trailing
  Both
}

@target(erlang)
@external(erlang, "string", "trim")
fn string_trim(a: String, b: Direction, c: List(Int)) -> String

There are other ways to slice this, such as taking a String as the final parameter and using its codepoints as the ones to trim, rather than a List(UtfCodepoint).

I haven't looked into what the JavaScript implementation would be, but can do so.

On a related note, my understanding is that the existing methods with the words "left" and "right" in them assume left-to-right reading order, which doesn't hold for all languages (e.g. Arabic, Hebrew). Could consider using "start" and "end" instead, or adding these as aliases.

lpil commented 2 months ago

Sounds good, but we'll need to think of some other name as the ones used in the code example don't fit with the convention used in this repo.

Changing left and right for start and end sounds good.

richard-viney commented 2 months ago

What names would you suggest?

Exposing the existing Direction type would mean only one public function is added:


@target(erlang)
pub fn trim_codepoints(
  string: String,
  direction: Direction,
  codepoints: String,
) -> String {
  let codepoint_ints =
    codepoints
    |> to_utf_codepoints
    |> list.map(utf_codepoint_to_int)

  do_trim_codepoints(string, direction, codepoint_ints)
}

@target(erlang)
@external(erlang, "string", "trim")
pub fn do_trim_codepoints(
  string: String,
  direction: Direction,
  codepoints: List(Int),
) -> String

@target(javascript)
pub fn do_trim_codepoints(
  string: String,
  direction: Direction,
  codepoints: List(Int),
) -> String {
  todo
}

lpil commented 2 months ago

We will not expose the direction type, it needs to be different functions.

inoas commented 1 week ago

maybe one of these

pub fn trim_codepoints(
  string: String
) { todo }

// and any pair:

pub fn trim_start_codepoints(
  string: String,
  by: String,
) { todo }

pub fn trim_end_codepoints(
  string: String,
  by: String,
) { todo }

pub fn trim_head_codepoints(
  string: String,
  by: String,
) { todo }

pub fn trim_tail_codepoints(
  string: String,
  by: String,
) { todo }

pub fn trim_beginning_codepoints(
  string: String,
  by: String,
) { todo }

pub fn trim_ending_codepoints(
  string: String,
  by: String,
) { todo }

gleam-lang / stdlib

More string trimming capabilities #587