Anders429 / substring

A substring method for string types.
Apache License 2.0
11 stars 1 forks source link

Negative indexing #6

Closed giraffekey closed 3 years ago

giraffekey commented 4 years ago

Support for negative indexing would be helpful.

Example:

string.substring(-5, -3)

On the string "hello world" this would return "wo".

Anders429 commented 3 years ago

Thanks for the suggestion!

Unfortunately, this would require changing the substring() to accept isizes instead of usizes. This would be a breaking change in the API. If we, say, released a 2.0.0 version with the method signature fn substring(&self, start_index: isize, end_index: isize) -> &str instead of what it is now, then everyone who wants to upgrade from version 1 to version 2 would have to replace every instance of s.substring(x, y) with s.substring(x as isize, y as isize), which is less than ideal. To make a breaking change to the API like that, I would want to have a really good reason, and I just don't think that a shortcut for s.substring(s.len() - i, s.len() - j) is a good enough motivation.

On that same note, changing the indices from usize to isize would cause some issues with working with large strings, since the max length of a String is usize::MAX, since String is built on Vec which stores its length as a usize. Where we now can extract a substring from any part of any theoretical string, making this API change would cause us to lose coverage. That's a regression that I just don't find worth it.

However, this could instead be resolved by introducing a new, separate method that simply indexes from the back. Something with a signature like fn substring_from_end(&self, start_index: usize, end_index: usize) -> &str. Then your example of string.substring(-5, -3) would just become string.substring_from_end(5, 3), and there would be no loss of coverage in the possible range of the substring. But, the implementation of that would essentially be:

pub trait Substring {
    fn substring(&self, start_index: usize, end_index: usize) -> &str;

    fn substring_from_end(&self, start_index: usize, end_index: usize) -> &str {
        self.substring(self.len() - end_index, self.len() - start_index)
    }
}

I have a few issues with it:

  1. Notice the switched reversed order of self.len() - end_index and self.len() - start_index. This presents, I think, the greater problem with negative indexing and negative slicing: it's confusing. There has already been significant discussion elsewhere on the possibility of introducing negative indexing to the Rust standard library, and as far as I can tell, the consensus seems to be that it ends up being a common source of bugs in languages like Python. I tend to agree there: I can already see someone complaining about using s.substring_from_end(3, 5) and complaining that it returns the empty string instead of a string slice of length 2 like they expect. The same thing would happen if we used isizes: which is correct, s.substring(-5, -3) or s.substring(-3, -5)?
  2. This extra method is nothing more than a "shortcut" method for calling substring(). I just don't think it is worth it to extend the API for such an arbitrary function, when s.substring(s.len() - x, s.len() - y) is more verbose, causes less confusion, and only requires a few more keystrokes.
  3. Even if we do go with this new method, what about cases where perhaps someone wants to mix positive and negative indices together? Then we either have to add a new method to cover that use case, which is, again, another wrapper around substring(), or we have to concede to using isizes instead. For the former, I worry about bloating the API unnecessarily for these shortcuts that only make things less verbose. The latter has already been discussed above.

So I guess that's a really long way to say that I am going to punt on this one. I don't think there is enough potential benefit to warrant the changes. s.substring(s.len() - x, s.len() - y) is sufficient, in my opinion :)