cessen / ropey

A utf8 text rope for manipulating and editing large texts.
MIT License
1.04k stars 46 forks source link

Chunks iterator get's confused when you change direction twice. #63

Closed smohekey closed 2 years ago

smohekey commented 2 years ago

If you call prev() once next() has returned None, then call prev() followed by next() again. The last call to next() should return None a second time, instead the last two calls (prev() and next()) both return the same chunk.

I've created a reproduction repository here: https://github.com/smohekey/test-ropey.

cessen commented 2 years ago

If I'm understanding you correctly, that's actually the intended and documented behavior. Basically, the iterator is like a cursor that sits between elements. When it moves forward or backward it returns the element it jumps over.

In your case, you have a list of chunks (represented with numbers here): [1 2 3 4 5]

If the cursor (represented with a vertical bar here) is already at the end and you call next() you'll get None:

[1 2 3 4 5 | ]

If you then call prev() -> next() it jumps back over the 5th chunk and then forward over the 5th chunk again:

[1 2 3 4 | 5] [1 2 3 4 5 | ]

Returning the 5th chunk both times.

But maybe I'm misunderstanding your description of the issue?

smohekey commented 2 years ago

Hmm, that explains the behavior I'm seeing. However it doesn't really match with my understanding of a cursor ( https://en.wikipedia.org/wiki/Cursor_(databases) ) where the cursor points at a given row (chunk in this case), and calling next or previous moves the cursor forward or backward to the next or previous row (chunk).

Could you give an example of where the intended behavior is useful? I can't really think of any.

cessen commented 2 years ago

It may very well be that this wasn't the best choice of behavior. But at the time it made sense to me. There's a good chance I'll change it in a future Ropey 2.0 (though when that might happen, I'm not sure).

However it doesn't really match with my understanding of a cursor

When I said cursor, I meant like a text cursor. My thinking when I decided on that behavior was:

  1. Ropey is a text representation, so having e.g. the char iterator follow common text cursor conventions has a certain familiarity to it.
  2. For things like DFA regex, where after you find the end of a match you need to scan backwards again to find its start, this is exactly the behavior you need from e.g. a byte or char iterator.
  3. If the byte and char iterator are going to behave that way, I might as well make all the iterators behave that way for consistency.

But I think you're right that, in retrospect, following more typical conventions where the iterator is "on" an item and returns that item when it jumps onto it would probably have been better. And for DFAs it's pretty easy to adapt to that behavior with a bit of minor internal logic.

I take long-term backwards compatibility really seriously, though, and don't want to release major breaking versions regularly for individual/minor API changes. So I expect this behavior won't change until a larger overhaul of Ropey (also fixing other warts in the API) in a future 2.0. But I really appreciate the feedback! Hopefully the current API isn't too much of a pain to work with.

smohekey commented 2 years ago

I was able to work around it very easily by keeping a note of which direction I had moved last, and if I change direction, I know to call the associated function twice to skip over the duplicate.

It's interesting that you mention DFA regex as this is exactly what I'm using ropey (and regex_automata) for. However, I'm using them both to implement a token reader, which means I'm never searching for a match in the middle of rope, I'm always searching for a match from the current position in the rope, so I don't need to go backwards to find the start of the match.

cessen commented 2 years ago

However, I'm using them both to implement a token reader, which means I'm never searching for a match in the middle of rope

Yeah. And moreover, for the chunks iterator in particular the "cursor between items" approach really doesn't make any sense. And in retrospect, it's more likely to pass chunks to something like a regex engine rather than a char or byte iterator, I suspect, since that way the underlying engine can do more advanced optimizations.

Anyway, I'm going to close this issue. Not because I'm rejecting the feedback (on the contrary, I think you've convinced me to change the behavior in the next major version, whenever that happens). Just because it's not actually a bug.

Thanks a bunch!