cessen / ropey

A utf8 text rope for manipulating and editing large texts.
MIT License
1.04k stars 46 forks source link

When to use from_reader() vs RopeBuilder #40

Closed pickfire closed 3 years ago

pickfire commented 3 years ago
let file = File::open(&path).context(format!("unable to open {:?}", path))?;   
Rope::from_reader(BufReader::new(file))?                                      

Should we re-buffer from_reader, does it already buffer it? Or we don't need to use BufReader? If yes, maybe the docs might want to mention it. I was looking at the code and it seemed like it does some buffering. I just ponder upon this when working on helix.

cessen commented 3 years ago

Hi @pickfire , thanks for the question!

The from_reader() convenience method does have some internal buffering, which I imagine may mitigate the need for BufReader to a large extent. However, the actual reason for the buffering is different: it's not to make IO faster, it's to make building the Rope faster.

If buffered IO is important for your use-case, I would recommend one of two things:

  1. Still use a BufReader in combination with from_reader(). Even though the current implementation of from_reader() uses an internal buffer, that is not an API/functionality promise, it's just an implementation detail in making non-IO related things more efficient.
  2. Instead of depending on the from_reader() convenience method, directly use the RopeBuilder so that you can handle IO yourself and make everything work exactly how you want it to. Internally, all from_reader() really does is collect chunks of text data to feed to RopeBuilder--it's really just a convenience wrapper.

For Helix in particular, I would definitely recommend approach 2, directly using RopeBuilder. The from_reader() method is mainly intended for more casual use of Ropey. I suspect Helix will, at least at some point, want more low-level control over how reading works, and that's exactly what RopeBuilder is for. And then Helix can control exactly how buffering is handled.

I hope that answers your question!

cessen commented 3 years ago

Also, the Ropey repo has an example of directly using the RopeBuilder, in case you want to take a look: https://github.com/cessen/ropey/blob/master/examples/read_latin_1.rs

In that example, it's for the purpose of transcoding from latin-1 to utf8 while loading the file in a streaming fashion. But it can be used for all kinds of things:

  1. Transcoding, as in the example.
  2. Handling invalid text data (e.g. loading corrupted files, replacing invalid data with stand-in characters so that the user can manually fix).
  3. Low-level control over how you handle IO (e.g. buffering).
  4. Providing a progress bar when loading large files.
  5. Making the loading of large files cancellable by the user.
  6. Etc.

So again, I think something like Helix probably shouldn't be using from_reader() anyway, at least eventually. There is a lot of functionality you probably want Helix to have when loading files in order to be a really robust editor, and you just can't get that out of from_reader(). That's what RopeBuilder is for.

Hope this helps!

pickfire commented 3 years ago

At least right now I believe we are expecting UTF-8, is it still necessary if we don't have to do any transcoding?

cessen commented 3 years ago

Basically, if you're at the point where you care about the internals of from_reader() (such as whether it's internally doing any buffering or not), then you should be using RopeBuilder directly to get the precise behavior you want. The implementation details of from_reader() are not guaranteed, and may change in the future.

cessen commented 3 years ago

Maybe this is a bit more clear. You basically have two options:

  1. If you want convenience: use from_reader() + BufReader. This doesn't guarantee optimal efficiency, but should always perform reasonably well, and is convenient when you don't care about any of the concerns I listed up-thread.
  2. If you want precise control: directly use RopeBuilder. This is less convenient, but it gives you the control to make sure things work exactly how you want.

Hope that helps clarify things!

cessen commented 3 years ago

Reopened because I think this should be clearer in the documentation.

cessen commented 3 years ago

Updated the docs to hopefully be clearer about this in 912ea78b7b1ffdad2009f0c1fed8953d90732442.