cessen / ropey

A utf8 text rope for manipulating and editing large texts.
MIT License
1.04k stars 46 forks source link

Behaviour of "char_to_line" at index one-past-the-end #11

Closed nepp2 closed 6 years ago

nepp2 commented 6 years ago

Firstly, thanks for the excellent library! I have run into one bit of confusing behaviour though, and I couldn't see an issue about it so I thought I'd raise it as a question.

Quoting the documentation:

If char_idx is one-past-the-end, then one-past-the-end line index is returned. This is mainly unintuitive for empty ropes, which will return a line index of 1 for a char_idx of zero. Otherwise it behaves as expected.

https://docs.rs/ropey/0.6.3/ropey/struct.Rope.html#method.char_to_line

The code works as described, but the behaviour seems strange to me, whether the rope is empty or not. It is causing me to add quite a few special cases to my code. Why does it not return the index of the last line, rather than a hypothetical new line?

cessen commented 6 years ago

Hi nepp2! Sorry for the delayed response.

The rationale behind this is:

  1. To be consistent with line_to_char, which returns one-past-the-end char for one-past-the-end line.
  2. To be consistent with the byte_to_char and char_to_byte operations as well (which aren't public in the current release, but will be in the next one).
  3. To be consistent with how most other indexing in general works.

However, lines are indeed a bit of a weird case, and it's not clear to me that this behavior is what's best. I'm definitely open to suggestions!

However, consider that the current behavior makes most round-tripping of index conversions work correctly. It also seems a bit weird to me if the line we return doesn't contain char index given. However, the strange special-case that comes up with empty strings does suggest to me that I may not have made the right choice here. So, again, I'm open to suggestions! But I want to make it clear that this isn't necessarily an obvious problem, because different (and in my mind equally strange) special cases come up if we do it differently, such as the way you're currently suggesting.

In the end, what probably makes the most sense is to implement the behavior that is most useful to real client code.

It is causing me to add quite a few special cases to my code.

What are you trying to do, and what special cases are you having to write? Knowing this would really help me in deciding what the behavior ought to be.

Thanks!

cessen commented 6 years ago

Thinking about it more, I think the current release behavior is indeed sub-optimal. It's both unintuitive and isn't any more useful that way in practice.

In 1abaf99102386facf0f61ad622165e5319097cea I've changed the behavior to be equivalent to simply counting the number of line endings before the given index. This will be in the next release (v0.8.0).