cessen / ropey

A utf8 text rope for manipulating and editing large texts.
MIT License
1.04k stars 46 forks source link

`ropey` doesn't agree with `wc` #60

Closed evanrelf closed 2 years ago

evanrelf commented 2 years ago

I found it confusing how ropey's line count doesn't agree with BSD or GNU wc's line count.

I wrote a simple reproduction of this difference here: https://github.com/evanrelf/ropey-confusion

The TL;DR is that ropey considers an empty file (e.g. created with touch) to have one line, whereas wc considers it to have zero lines.

I don't necessarily think there's anything wrong with this behavior. But I found it confusing and I thought it might be worth mentioning in the documentation.

Other than this little thing, ropey has been a joy to use while working on building my own text editor. Thanks for making such a great library! 🙂

cessen commented 2 years ago

Thanks for taking the time to submit this issue!

I think the documentation is already pretty clear about what Ropey considers to be a line. For example, in the A Note About Line Endings section and the byte_to_line documentation.

Though I suppose it doesn't hurt to make an explicit mention that an empty buffer is considered to have a single empty line. It might also help to rename the aforementioned section to "A Note About Line Breaks", since Ropey actually treats them as breaks rather than endings per se—unlike unix conventions, Ropey does not require lines to end in a break to be considered a valid line (my reasoning for that is partially outlined here).

I'll leave this open until I get around to the documentation tweaks.

evanrelf commented 2 years ago

I think the documentation is already pretty clear about what Ropey considers to be a line. For example, in the A Note About Line Endings section and the byte_to_line documentation.

Ah, my bad! I think the documentation you have right now explains how things work well enough. I just didn't connect the dots on that.

cessen commented 2 years ago

No worries! It never hurts to make things a little more clear. :-)

evanrelf commented 2 years ago

👍

cessen commented 2 years ago

I'm actually going to leave this open, because I think the documentation around this can indeed be improved a bit. But thanks!

cessen commented 2 years ago

I've made some documentation tweaks in 249902178800c0ccbc20a37afe4aa8bd78d948da that hopefully make things clearer.

evanrelf commented 1 year ago

If anyone wants the len_lines behavior I was expecting, I think I've narrowed it down to this:

use ropey::Rope;

pub trait RopeExt {
    fn len_lines_indigo(&self) -> usize;
}

impl RopeExt for Rope {
    fn len_lines_indigo(&self) -> usize {
        if self.len_chars() == 0 {
            return 0;
        }
        let last_char = self.char(self.len_chars() - 1);
        self.len_lines() - if last_char == '\n' { 1 } else { 0 }
    }
}
And here are some tests I wrote that show the difference in behavior ```rust #[cfg(test)] mod tests { use super::*; #[test] fn rope_length() { let rope = Rope::default(); assert_eq!(rope.len_chars(), 0); assert_eq!(rope.len_lines(), 1); assert_eq!(rope.len_lines_indigo(), 0); let rope = Rope::from_str("x"); assert_eq!(rope.len_chars(), 1); assert_eq!(rope.len_lines(), 1); assert_eq!(rope.len_lines_indigo(), 1); let rope = Rope::from_str("\n"); assert_eq!(rope.len_chars(), 1); assert_eq!(rope.len_lines(), 2); assert_eq!(rope.len_lines_indigo(), 1); let rope = Rope::from_str("x\n"); assert_eq!(rope.len_chars(), 2); assert_eq!(rope.len_lines(), 2); assert_eq!(rope.len_lines_indigo(), 1); let rope = Rope::from_str("x\ny\nz"); assert_eq!(rope.len_chars(), 5); assert_eq!(rope.len_lines(), 3); assert_eq!(rope.len_lines_indigo(), 3); let rope = Rope::from_str("x\ny\nz\n"); assert_eq!(rope.len_chars(), 6); assert_eq!(rope.len_lines(), 4); assert_eq!(rope.len_lines_indigo(), 3); } } ```