Wilfred / difftastic

a structural diff that understands syntax 🟥🟩
https://difftastic.wilfred.me.uk/
MIT License
20.59k stars 333 forks source link

fix: out of bounds panic when diffing large number of words #695

Open JonathanxD opened 5 months ago

JonathanxD commented 5 months ago

I dropped all the old text because it does not make sense anymore.

The problem is basically that Rust's str::lines() will return the same number of lines regardless of the last character being a new line or not, and the assumptions made on inline and side-by-side display implementations depends on having a clear distinction of those cases.

~Those changes introduces a function that will chain an additional element when the string ends with a newline. I'm not aware of any function from stdlib that does what we need (and I couldn't find one by looking at the documentation), the closest one was split, but it has obvious implications (\r\n, for example).~

This does solves #688, #694, #682 and #681, I tested all the provided files and they're all working without any panics.

Update 2024-07-08:

I reworked on this patch and now it does not add a new line to the end of the string, therefore, all tests are passing, including the regression ones (which was failing before). In this new revision it fixes #682 again, which involves --display=inline and was not working anymore when I initially merged the changes from master into my branch.

I also tested another case that I found out and it fixes #656.

The new approach is more hacky: in one case it just removes the additional line range when the line is empty, and in another case it just checks if the index is within range before trying to access the index. However, unlike the previous solution, this one does not fail the regression tests.

JonathanxD commented 5 months ago

I ran a small script to compare the output of db281c682cb1e19f68392bf2bbffb27e48cba1be with my branch to find the cause of regression, it's all the trailing newlines that were not being printed before.

Wilfred commented 5 months ago

Thanks for the pull request, and the clear explanation! This sounds like a regression from 8c004be87b5553809d1d865fc76adb754c2c3b55.

JonathanxD commented 5 months ago

Thanks for the pull request, and the clear explanation!

You're welcome.

This sounds like a regression from 8c004be.

I'm not sure if this is the cause of the regression because unless I'm missing something, the changes in this commit are just inlining the content of split_on_newlines on the call-site, which was already using lines(). I thought it could be different but after trying on playground, both versions yielded the same results.

@Wilfred Do you have any plans to merge my changes in the near future? If so, I can update the regression file so the pipeline succeeds. If you don't want to merge, I can use the examples provided in the reported issues to bisect which commit caused the regression and bring it to the discussion.

fschoenm commented 4 months ago

What's the status of this PR? I'm running into this issue too.

JonathanxD commented 4 months ago

What's the status of this PR? I'm running into this issue too.

Idk, everything is fine from my side. It depends on how @Wilfred wants to go from there.

I've been personally just building difftastic with my patches on top since without them difft would crash 90% of the time for my use-case.

JonathanxD commented 2 months ago

I noticed that #731 does the same thing as I did (but slightly differently), which is to ignore the last empty line, and it will fix most of the out-of-bounds panics except for the yaml one, which is caused by the same problem, but in a different location (mine "fixes" this one by skipping out of bounds indexes).