Closed pascalkuthe closed 1 year ago
This is definitely the approach I wanted to take with this. Indeed, it's subtle to get right, which is why I put it off for so long, ha ha. Thanks for taking this on! I'll take a look at the actual code when my brain is functioning properly again.
Ah... maybe I'm forgetting how github works, but it looks like the commits from your other PRs are rolled in with this one? Could you keep your PRs on separate branches so it's easier to review? (And also easier to accept some PRs without having to accept others too early.)
Hey @cessen, thanks for all the feedback. Please take the time to recover, I hope you get well soon! Regarding the extra commits you are totally right about that. This PR is not ready for review yet (hence marked as draft) and not at all in a state where I would usually post it. It just wanted to post my state for the discussion in #69. There are still comments to add, a few edge cases to fix, benchmarks to perform and indeed removing the other commits. I was just starting off my local master, sorry about that.
Closing the PR was a missclick. Sorry about that
Should we also include https://github.com/cessen/ropey/pull/69/commits/159459652c2e33c2423e7f6da39d248beec13c85?
Should we also include 1594596?
I would hope that this optimizes to the same code but I will check the assembly later to make sure. If the compiler can indeed not optimize this then including it might add a small performance win
BTW the current implementation does not yet use SIMD for finding the next newline within a chunk because the functions available in str_indicies do not do what I need. I already have a SIMD version of find_next_line_break
(same as line_to_byte_idx(text, 1) but returns None if no line break is found to disambiguate a newline char at the very end from no newline char) implemented and the reverse iteration function should be decently easy as well. From testing I did locally this has the potential to improve performance even further as most time is actually spent on finding the newlines within chunks and the SIMD versionof those functions is at lest 2x faster (for lines linger then 16 bytes)
Should I prepare a PR to str_indices for that and block this PR on that or should that be a follow-up PR?
Maybe open a draft PR targetting this PR. Github will automatically switch it to target master
if this one lands first.
Those results look great! Awesome work.
I still don't have the energy for a full code review yet, but I left a couple of notes that I think can be addressed already.
Should I prepare a PR to str_indices for that and block this PR on that or should that be a follow-up PR?
I don't think any changes to str_indices are necessary. I made an in-line review comment about how to implement your function in terms of lines::to_byte_idx()
.
Those results look great! Awesome work.
I still don't have the energy for a full code review yet, but I left a couple of notes that I think can be addressed already.
Thanks for your comments! I replied to these, both of those are on me. I anticipated these exact review comments (were my first thoughts aswell) and wanted to add review comments myself to preempt them but I ended up forgetting about it. Sorry about that! Please take your time to recover, I hope you feel better soon! Health always comes first.
@cessen After rebasing this PR on master the tests you added actually cached a couple of bugs I didn't consider in my implementation (and then some more by the proptests because the changes exercised some new code paths). The implementation really is extremely subtle in places.
In the process I have also switched back to line_to_byte_idx
as you suggested. I did not end up implementing that as a separate function because I only needed in one place and when considering the additional edge-cases the condition actually ended up being a bit more complex then you suggested.
The good news is that now all tests pass so this implementation is hopefully correct.
Because this PR is already quite big (and a large performance win in every situation) I would like to outscope the SIMD implementation for the reverse iteration for now. I think it would definitely be good to have that (and I know how to implement it) but it seems reasonable to do this in a followup PR (especially because I actually noticed that the current function needed to be adjusted to fix an edge-case PR and I am not 100% sure what the required API will be)
@cessen I am not sure why the three tests are failing in the miri CI. They work fine in the normal CI/locally? Maybe some SIMD free fallback function used by MIRI behaves differently then the normal function? It doesn't seem related to this PR.
Edit: It seems the same MIRI tests fail on master too so its probably not related to this PR.
Because this PR is already quite big (and a large performance win in every situation) I would like to outscope the SIMD implementation for the reverse iteration for now.
Yeah, I think that's a good call. Honestly, it's not totally clear to me that it's worth bothering with at all. Because although reverse Char
and Byte
iterators have some common performance-critical use cases, I struggle to think of similar cases for reverse Lines
iterators. So it might be one of those "wait until someone has a use case" things.
It seems the same MIRI tests fail on master too so its probably not related to this PR.
Yeah, I doubt it's related to this PR. Feel free to ignore them. I'll look into and fix them before the next release.
There was another problem with CI where medium.txt
was removed which made the test I added for #67 fail.
I think I might have just forgotten to stage that file in that PR. I added a commit with that file but feel free to just fix that straight in master. It doesn't warrant another PR I think (I will drop the commit then)
Sorry I meant to fix your comments today but totally forgot about it. Its pretty late now here but I will do it tomorrow moring!
I fixed the mistakes in the comments and added a ton more. In the process I went trough the entire code again and found two unnecessary branches in the new_with_range_at
function. These branches had simply no effect anymore (they were only needed in previous versions of the implementation). This simplified that function quite a bit (although it did not remove the const generic).
I think/hope that the added comments make this PR a easier to review.
I would recommend starting with the next_impl
function. The prev_impl
function is a bit more tricky and I think the context of the constructor isn't really required for reviewing the next_impl
function.
Thanks @pascalkuthe! I'll try to get to this as soon as I can. A somewhat more urgent task came up in another project, but I think that will probably wrap up within a week. Let me know if this gets urgent, and I'll bump up the priority.
I think @archseer was planning to do a new helix release at the end of this month and wanted to include my two PRs that are blocked on a new ropey version (helix-editor/helix#3890 and helix-editor/helix#4457). It would be awesome if that would workout. I think we could still manage that if we continue next week with the review and nothing mayor turns up but I of-course understand that other things have to take priority @cessen.
Okay, so things have gotten even more hectic. Long story short: I'm working as the VFX supervisor on a film shoot this week (Mon-Fri). I thought I was going to have time in the evenings to work on the other task I mentioned, but that mostly doesn't seem to be the case. So that other task (which also has a deadline of end of the month) is also getting postponed, which would push out the full review of this PR even further.
I don't think it makes sense to keep you guys waiting on me, particularly if you want to make a release by the end of the month. So I think what I might do is this:
Does that sound reasonable? Basically, I don't want to block you guys, and I definitely want to merge this PR. But I also don't want to make a "proper" release without thoroughly reviewing the code first.
Okay, so things have gotten even more hectic. Long story short: I'm working as the VFX supervisor on a film shoot this week (Mon-Fri). I thought I was going to have time in the evenings to work on the other task I mentioned, but that mostly doesn't seem to be the case. So that other task (which also has a deadline of end of the month) is also getting postponed, which would push out the full review of this PR even further.
I don't think it makes sense to keep you guys waiting on me, particularly if you want to make a release by the end of the month. So I think what I might do is this:
1. Do a cursory review, just to make sure nothing obviously wrong jumps out. 2. Merge the PR. 3. Publish a pre-release to crates.io that you guys can use for the upcoming Helix release. 4. When I get the time, review the code properly and just make any changes I'd like to see myself. And then make a proper release.
Does that sound reasonable? Basically, I don't want to block you guys, and I definitely want to merge this PR. But I also don't want to make a "proper" release without thoroughly reviewing the code first.
I fully understand that other things have to take priority.
Your proposal sounds great, thank you for offering that!
I have spoken with @archseer and a crates.io prerelease like 0.16.0-rc1
) with a cursory reviewed version of this PR would unblock the relevant PRs for helix.
I hope that the comments I added will allow the cursory review to be quick so it doesn't take too much of your time. Once you have more time again and want dive fully into the details of this PR in the future, I will still be happy to answer any questions about the implementation (or implement improvements if you don't want do it yourself).
@cessen sorry for the ping. Do you think you could take a look at this soon? The end of November is only a week away so there is not that much time left to land the PRs blocked on this for the next helix release. Quite a few people have been daily driving these changes in helix (by using my PR based on this as their daily driver). In that PR a line iteration is performed on every keypress (and incorrect iterations is imminently apparent). So it's been fuzzed quite a bit. That together with the very detailed testsuite gives me a lot of confidence in the implementation.
Hi @pascalkuthe,
Sorry for the silence. I'll make sure to get to this by at least end of day Sunday (possibly sooner).
Thanks for your patience! November ended up being super busy for me, so I wasn't able to get to things as promptly as I prefer.
I've done a cursory review, and although there are a few things I'd like to look into a bit deeper, there's nothing that jumps out to me as incorrect. Combined with the testing and real-world usage you discussed, I'm more than happy to make an alpha release out of this for Helix to depend on. And then I can dive deeper a bit more at my leisure over the next month or so to get things ready for a proper release.
Thanks a bunch for your work on this! It's been a wart in Ropey for a while, so it's awesome to finally get it addressed.
~~Note: This is a very early draft and still contains known issues. CI failures are known and expected (multiple tests relating to the reverse iterator still fail). It's just posted here to allow comparison with the approach in #69 .~~
This PR is now mostly ready for review, although I still want to add some more comments and clean up the code a bit. But this PR works now so reviewing makes sense now. I have dropped the unrelated commits and removed debug prints.
This implements a line iterator using a custom chunks iterator that tracks the lowest common parent across line boundries to avoid a binary search in
RopeSlice::new
.The results are pretty spectacular. On synthetic benchmarks it outperforms the old implementation by a factor between 3x and 9x. See the run of the builtin benchmarks below (this is the wrong way around, I benchmarked the new implementation first and then the old one):
I have tested the iterator in helix with helix-editor/helix#4457, reloading a 1GB file improves the total reload time (excluding IO) from 14 seconds to 7 second.