faster line iterator - Githubissues

pascalkuthe commented 2 years ago

~~Note: This is a very early draft and still contains known issues. CI failures are known and expected (multiple tests relating to the reverse iterator still fail). It's just posted here to allow comparison with the approach in #69 .~~

This PR is now mostly ready for review, although I still want to add some more comments and clean up the code a bit. But this PR works now so reviewing makes sense now. I have dropped the unrelated commits and removed debug prints.

This implements a line iterator using a custom chunks iterator that tracks the lowest common parent across line boundries to avoid a binary search in RopeSlice::new.

The results are pretty spectacular. On synthetic benchmarks it outperforms the old implementation by a factor between 3x and 9x. See the run of the builtin benchmarks below (this is the wrong way around, I benchmarked the new implementation first and then the old one):

I have tested the iterator in helix with helix-editor/helix#4457, reloading a 1GB file improves the total reload time (excluding IO) from 14 seconds to 7 second.

cessen commented 2 years ago

This is definitely the approach I wanted to take with this. Indeed, it's subtle to get right, which is why I put it off for so long, ha ha. Thanks for taking this on! I'll take a look at the actual code when my brain is functioning properly again.

cessen commented 2 years ago

Ah... maybe I'm forgetting how github works, but it looks like the commits from your other PRs are rolled in with this one? Could you keep your PRs on separate branches so it's easier to review? (And also easier to accept some PRs without having to accept others too early.)

pascalkuthe commented 2 years ago

Hey @cessen, thanks for all the feedback. Please take the time to recover, I hope you get well soon! Regarding the extra commits you are totally right about that. This PR is not ready for review yet (hence marked as draft) and not at all in a state where I would usually post it. It just wanted to post my state for the discussion in #69. There are still comments to add, a few edge cases to fix, benchmarks to perform and indeed removing the other commits. I was just starting off my local master, sorry about that.

Closing the PR was a missclick. Sorry about that

archseer commented 2 years ago

Should we also include https://github.com/cessen/ropey/pull/69/commits/159459652c2e33c2423e7f6da39d248beec13c85?

pascalkuthe commented 2 years ago

Should we also include 1594596?

I would hope that this optimizes to the same code but I will check the assembly later to make sure. If the compiler can indeed not optimize this then including it might add a small performance win

pascalkuthe commented 2 years ago

BTW the current implementation does not yet use SIMD for finding the next newline within a chunk because the functions available in str_indicies do not do what I need. I already have a SIMD version of find_next_line_break (same as line_to_byte_idx(text, 1) but returns None if no line break is found to disambiguate a newline char at the very end from no newline char) implemented and the reverse iteration function should be decently easy as well. From testing I did locally this has the potential to improve performance even further as most time is actually spent on finding the newlines within chunks and the SIMD versionof those functions is at lest 2x faster (for lines linger then 16 bytes)

Should I prepare a PR to str_indices for that and block this PR on that or should that be a follow-up PR?

archseer commented 2 years ago

Maybe open a draft PR targetting this PR. Github will automatically switch it to target master if this one lands first.

cessen commented 2 years ago

Those results look great! Awesome work.

I still don't have the energy for a full code review yet, but I left a couple of notes that I think can be addressed already.

cessen commented 2 years ago

Should I prepare a PR to str_indices for that and block this PR on that or should that be a follow-up PR?

I don't think any changes to str_indices are necessary. I made an in-line review comment about how to implement your function in terms of lines::to_byte_idx().

pascalkuthe commented 2 years ago

Those results look great! Awesome work.

I still don't have the energy for a full code review yet, but I left a couple of notes that I think can be addressed already.

Thanks for your comments! I replied to these, both of those are on me. I anticipated these exact review comments (were my first thoughts aswell) and wanted to add review comments myself to preempt them but I ended up forgetting about it. Sorry about that! Please take your time to recover, I hope you feel better soon! Health always comes first.

pascalkuthe commented 1 year ago

@cessen After rebasing this PR on master the tests you added actually cached a couple of bugs I didn't consider in my implementation (and then some more by the proptests because the changes exercised some new code paths). The implementation really is extremely subtle in places.

In the process I have also switched back to line_to_byte_idx as you suggested. I did not end up implementing that as a separate function because I only needed in one place and when considering the additional edge-cases the condition actually ended up being a bit more complex then you suggested.

The good news is that now all tests pass so this implementation is hopefully correct.

Because this PR is already quite big (and a large performance win in every situation) I would like to outscope the SIMD implementation for the reverse iteration for now. I think it would definitely be good to have that (and I know how to implement it) but it seems reasonable to do this in a followup PR (especially because I actually noticed that the current function needed to be adjusted to fix an edge-case PR and I am not 100% sure what the required API will be)

pascalkuthe commented 1 year ago

@cessen I am not sure why the three tests are failing in the miri CI. They work fine in the normal CI/locally? Maybe some SIMD free fallback function used by MIRI behaves differently then the normal function? It doesn't seem related to this PR.

Edit: It seems the same MIRI tests fail on master too so its probably not related to this PR.

cessen commented 1 year ago

Because this PR is already quite big (and a large performance win in every situation) I would like to outscope the SIMD implementation for the reverse iteration for now.

Yeah, I think that's a good call. Honestly, it's not totally clear to me that it's worth bothering with at all. Because although reverse Char and Byte iterators have some common performance-critical use cases, I struggle to think of similar cases for reverse Lines iterators. So it might be one of those "wait until someone has a use case" things.

It seems the same MIRI tests fail on master too so its probably not related to this PR.

Yeah, I doubt it's related to this PR. Feel free to ignore them. I'll look into and fix them before the next release.

pascalkuthe commented 1 year ago

There was another problem with CI where medium.txt was removed which made the test I added for #67 fail. I think I might have just forgotten to stage that file in that PR. I added a commit with that file but feel free to just fix that straight in master. It doesn't warrant another PR I think (I will drop the commit then)

pascalkuthe commented 1 year ago

Sorry I meant to fix your comments today but totally forgot about it. Its pretty late now here but I will do it tomorrow moring!

pascalkuthe commented 1 year ago

I fixed the mistakes in the comments and added a ton more. In the process I went trough the entire code again and found two unnecessary branches in the new_with_range_at function. These branches had simply no effect anymore (they were only needed in previous versions of the implementation). This simplified that function quite a bit (although it did not remove the const generic).

I think/hope that the added comments make this PR a easier to review. I would recommend starting with the next_impl function. The prev_impl function is a bit more tricky and I think the context of the constructor isn't really required for reviewing the next_impl function.

cessen commented 1 year ago

Thanks @pascalkuthe! I'll try to get to this as soon as I can. A somewhat more urgent task came up in another project, but I think that will probably wrap up within a week. Let me know if this gets urgent, and I'll bump up the priority.

pascalkuthe commented 1 year ago

I think @archseer was planning to do a new helix release at the end of this month and wanted to include my two PRs that are blocked on a new ropey version (helix-editor/helix#3890 and helix-editor/helix#4457). It would be awesome if that would workout. I think we could still manage that if we continue next week with the review and nothing mayor turns up but I of-course understand that other things have to take priority @cessen.

cessen commented 1 year ago

Okay, so things have gotten even more hectic. Long story short: I'm working as the VFX supervisor on a film shoot this week (Mon-Fri). I thought I was going to have time in the evenings to work on the other task I mentioned, but that mostly doesn't seem to be the case. So that other task (which also has a deadline of end of the month) is also getting postponed, which would push out the full review of this PR even further.

I don't think it makes sense to keep you guys waiting on me, particularly if you want to make a release by the end of the month. So I think what I might do is this:

Do a cursory review, just to make sure nothing obviously wrong jumps out.
Merge the PR.
Publish a pre-release to crates.io that you guys can use for the upcoming Helix release.
When I get the time, review the code properly and just make any changes I'd like to see myself. And then make a proper release.

Does that sound reasonable? Basically, I don't want to block you guys, and I definitely want to merge this PR. But I also don't want to make a "proper" release without thoroughly reviewing the code first.

pascalkuthe commented 1 year ago

Okay, so things have gotten even more hectic. Long story short: I'm working as the VFX supervisor on a film shoot this week (Mon-Fri). I thought I was going to have time in the evenings to work on the other task I mentioned, but that mostly doesn't seem to be the case. So that other task (which also has a deadline of end of the month) is also getting postponed, which would push out the full review of this PR even further.

I don't think it makes sense to keep you guys waiting on me, particularly if you want to make a release by the end of the month. So I think what I might do is this:
1. Do a cursory review, just to make sure nothing obviously wrong jumps out.

2. Merge the PR.

3. Publish a pre-release to crates.io that you guys can use for the upcoming Helix release.

4. When I get the time, review the code properly and just make any changes I'd like to see myself.  And then make a proper release.
Does that sound reasonable? Basically, I don't want to block you guys, and I definitely want to merge this PR. But I also don't want to make a "proper" release without thoroughly reviewing the code first.

I fully understand that other things have to take priority. Your proposal sounds great, thank you for offering that! I have spoken with @archseer and a crates.io prerelease like 0.16.0-rc1) with a cursory reviewed version of this PR would unblock the relevant PRs for helix.

I hope that the comments I added will allow the cursory review to be quick so it doesn't take too much of your time. Once you have more time again and want dive fully into the details of this PR in the future, I will still be happy to answer any questions about the implementation (or implement improvements if you don't want do it yourself).

pascalkuthe commented 1 year ago

@cessen sorry for the ping. Do you think you could take a look at this soon? The end of November is only a week away so there is not that much time left to land the PRs blocked on this for the next helix release. Quite a few people have been daily driving these changes in helix (by using my PR based on this as their daily driver). In that PR a line iteration is performed on every keypress (and incorrect iterations is imminently apparent). So it's been fuzzed quite a bit. That together with the very detailed testsuite gives me a lot of confidence in the implementation.

cessen commented 1 year ago

Hi @pascalkuthe,

Sorry for the silence. I'll make sure to get to this by at least end of day Sunday (possibly sooner).

cessen commented 1 year ago

Thanks for your patience! November ended up being super busy for me, so I wasn't able to get to things as promptly as I prefer.

I've done a cursory review, and although there are a few things I'd like to look into a bit deeper, there's nothing that jumps out to me as incorrect. Combined with the testing and real-world usage you discussed, I'm more than happy to make an alpha release out of this for Helix to depend on. And then I can dive deeper a bit more at my leisure over the next month or so to get things ready for a proper release.

Thanks a bunch for your work on this! It's been a wart in Ropey for a while, so it's awesome to finally get it addressed.

cessen / ropey

faster line iterator #70