Wilfred / difftastic

a structural diff that understands syntax 🟥🟩
https://difftastic.wilfred.me.uk/
MIT License
20.59k stars 333 forks source link

Panic index out of bounds in side_by_side.rs on go.sum #688

Closed ofavre closed 2 months ago

ofavre commented 6 months ago

While running git show --ext-diff I noticed the following at the end of the diff:

[…]
go.sum --- 1/2 --- Text
  1 cloud.google.com/go v0.26.0/go.mod h1:aQUYkXzVsufM+DwF1aE+0xfcU+56JwCaLick0ClmMTw=
  2 cloud.google.com/go v0.110.0 h1:Zc8gqp3+a9/Eyph2KDmcGaPtbKRIoqq4YTlL4NMD0Ys=
  3 cloud.google.com/go v0.110.0/go.mod h1:SJnCLqQ0FCFGSZMUNUf84MV3Aia54kn7pi8st7tMzaY=
  4 cloud.google.com/go/compute v1.19.1 h1:am86mquDUgjGNWxiGn+5PGLbmgiWXlE/yNWpIpNvuXY=

go.sum --- 2/2 --- Text
.                                                                                                                     64 github.com/googleapis/enterprise-certificate-proxy v0.2.3/go.mod h1:AwSRAtLfXpU5Nm3pW+v7rGDHp09LsPtGY9MduiEsR9k=
.                                                                                                                     65 github.com/googleapis/gax-go/v2 v2.8.0 h1:UBtEZqx1bjXtOQ5BVTkuYghXrr3N4V123VKJK67vJZc=
.                                                                                                                     66 github.com/googleapis/gax-go/v2 v2.8.0/go.mod h1:4orTrqY6hXxxaUL4LHIPl6lGo8vAE38/qKbhSAKP6QI=
1 github.com/kylelemons/godebug v1.1.0/go.mod h1:9/0rRGxNHcop5bhtWyNeEfOS8JIWk580+fNqagV/RAw=                         67 github.com/kylelemons/godebug v1.1.0/go.mod h1:9/0rRGxNHcop5bhtWyNeEfOS8JIWk580+fNqagV/RAw=
2 github.com/naoina/go-stringutil v0.1.0 h1:rCUeRUHjBjGTSHl0VC00jUPLz8/F9dDzYI70Hzifhks=                              68 github.com/naoina/go-stringutil v0.1.0 h1:rCUeRUHjBjGTSHl0VC00jUPLz8/F9dDzYI70Hzifhks=
3 github.com/naoina/go-stringutil v0.1.0/go.mod h1:XJ2SJL9jCtBh+P9q5btrd/Ylo8XwT/h1USek5+NqSA0=                       69 github.com/naoina/go-stringutil v0.1.0/go.mod h1:XJ2SJL9jCtBh+P9q5btrd/Ylo8XwT/h1USek5+NqSA0=
4 github.com/naoina/toml v0.1.1 h1:PT/lllxVVN0gzzSqSlHEmP8MJB4MY2U7STGxiouV4X8=                                       70 github.com/naoina/toml v0.1.1 h1:PT/lllxVVN0gzzSqSlHEmP8MJB4MY2U7STGxiouV4X8=
5 github.com/naoina/toml v0.1.1/go.mod h1:NBIhNtsFMo3G2szEBne+bO4gS192HuIYRqfvOWb4i1E=                                71 github.com/naoina/toml v0.1.1/go.mod h1:NBIhNtsFMo3G2szEBne+bO4gS192HuIYRqfvOWb4i1E=
.                                                                                                                     72 github.com/pmezard/go-difflib v1.0.0/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4=
.                                                                                                                     73 github.com/prometheus/client_model v0.0.0-20190812154241-14fe0d1b01d4/go.mod h1:xMI15A0UPsDsEKsMN9yxemIoYk6Tm2C1GtY
.                                                                                                                     .. GdfGttqA=
6 github.com/streadway/amqp v0.0.0-20160311215503-2e25825abdbd h1:625/bJvSNfQrzzK5ttnUqMqnVe8M5MILmf5ZRGgeeDY=        74 github.com/streadway/amqp v0.0.0-20160311215503-2e25825abdbd h1:625/bJvSNfQrzzK5ttnUqMqnVe8M5MILmf5ZRGgeeDY=
7 github.com/streadway/amqp v0.0.0-20160311215503-2e25825abdbd/go.mod h1:1WNBiOZtZQLpVAyu0iTduoJL9hEsMloAK5XWrtW0xdY= 75 github.com/streadway/amqp v0.0.0-20160311215503-2e25825abdbd/go.mod h1:1WNBiOZtZQLpVAyu0iTduoJL9hEsMloAK5XWrtW0xdY=
thread 'main' panicked at src/display/side_by_side.rs:505:34:
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
fatal: external diff died, stopping at go.sum

And sure enough, the diff output was not complete for the go.sum file and there should also be other files after it.

I reproduced the issue by running difft go.sum.old.txt go.sum.new.txt on the following two files: go.sum.old.txt and go.sum.new.txt

Here is the output with complete stacktrace:

RUST_LOG=info RUST_BACKTRACE=1 RUST_BACKTRACE=full difft go.sum.old.txt go.sum.new.txt | cat
go.sum.new.txt --- 1/2 --- Text
File permissions changed from 100664 to 100644.
  1 cloud.google.com/go v0.26.0/go.mod h1:aQUYkXzVsufM+DwF1aE+0xfcU+56JwCaLick0ClmMTw=
  2 cloud.google.com/go v0.110.0 h1:Zc8gqp3+a9/Eyph2KDmcGaPtbKRIoqq4YTlL4NMD0Ys=
  3 cloud.google.com/go v0.110.0/go.mod h1:SJnCLqQ0FCFGSZMUNUf84MV3Aia54kn7pi8st7tMzaY=
  4 cloud.google.com/go/compute v1.19.1 h1:am86mquDUgjGNWxiGn+5PGLbmgiWXlE/yNWpIpNvuXY=

go.sum.new.txt --- 2/2 --- Text
.                                                                                                                     64 github.com/googleapis/enterprise-certificate-proxy v0.2.3/go.mod h1:AwSRAtLfXpU5Nm3pW+v7rGDHp09LsPtGY9MduiEsR9k=
.                                                                                                                     65 github.com/googleapis/gax-go/v2 v2.8.0 h1:UBtEZqx1bjXtOQ5BVTkuYghXrr3N4V123VKJK67vJZc=
.                                                                                                                     66 github.com/googleapis/gax-go/v2 v2.8.0/go.mod h1:4orTrqY6hXxxaUL4LHIPl6lGo8vAE38/qKbhSAKP6QI=
1 github.com/kylelemons/godebug v1.1.0/go.mod h1:9/0rRGxNHcop5bhtWyNeEfOS8JIWk580+fNqagV/RAw=                         67 github.com/kylelemons/godebug v1.1.0/go.mod h1:9/0rRGxNHcop5bhtWyNeEfOS8JIWk580+fNqagV/RAw=
2 github.com/naoina/go-stringutil v0.1.0 h1:rCUeRUHjBjGTSHl0VC00jUPLz8/F9dDzYI70Hzifhks=                              68 github.com/naoina/go-stringutil v0.1.0 h1:rCUeRUHjBjGTSHl0VC00jUPLz8/F9dDzYI70Hzifhks=
3 github.com/naoina/go-stringutil v0.1.0/go.mod h1:XJ2SJL9jCtBh+P9q5btrd/Ylo8XwT/h1USek5+NqSA0=                       69 github.com/naoina/go-stringutil v0.1.0/go.mod h1:XJ2SJL9jCtBh+P9q5btrd/Ylo8XwT/h1USek5+NqSA0=
4 github.com/naoina/toml v0.1.1 h1:PT/lllxVVN0gzzSqSlHEmP8MJB4MY2U7STGxiouV4X8=                                       70 github.com/naoina/toml v0.1.1 h1:PT/lllxVVN0gzzSqSlHEmP8MJB4MY2U7STGxiouV4X8=
5 github.com/naoina/toml v0.1.1/go.mod h1:NBIhNtsFMo3G2szEBne+bO4gS192HuIYRqfvOWb4i1E=                                71 github.com/naoina/toml v0.1.1/go.mod h1:NBIhNtsFMo3G2szEBne+bO4gS192HuIYRqfvOWb4i1E=
.                                                                                                                     72 github.com/pmezard/go-difflib v1.0.0/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4=
.                                                                                                                     73 github.com/prometheus/client_model v0.0.0-20190812154241-14fe0d1b01d4/go.mod h1:xMI15A0UPsDsEKsMN9yxemIoYk6Tm2C1GtY
.                                                                                                                     .. GdfGttqA=
6 github.com/streadway/amqp v0.0.0-20160311215503-2e25825abdbd h1:625/bJvSNfQrzzK5ttnUqMqnVe8M5MILmf5ZRGgeeDY=        74 github.com/streadway/amqp v0.0.0-20160311215503-2e25825abdbd h1:625/bJvSNfQrzzK5ttnUqMqnVe8M5MILmf5ZRGgeeDY=
7 github.com/streadway/amqp v0.0.0-20160311215503-2e25825abdbd/go.mod h1:1WNBiOZtZQLpVAyu0iTduoJL9hEsMloAK5XWrtW0xdY= 75 github.com/streadway/amqp v0.0.0-20160311215503-2e25825abdbd/go.mod h1:1WNBiOZtZQLpVAyu0iTduoJL9hEsMloAK5XWrtW0xdY=
thread 'main' panicked at src/display/side_by_side.rs:505:34:
index out of bounds: the len is 7 but the index is 7
stack backtrace:
   0:     0x558cbfe55137 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::hb2d11bf794d51a44
   1:     0x558cbfc8e800 - core::fmt::write::hc9c39fd060f58be5
   2:     0x558cbfe474ce - std::io::Write::write_fmt::h0f12046d62ab82d9
   3:     0x558cbfe54ea4 - std::sys_common::backtrace::print::had20f5e369bd8b8b
   4:     0x558cbfe32a4a - std::panicking::default_hook::{{closure}}::h5f274992636625c8
   5:     0x558cbfe32709 - std::panicking::default_hook::hda96c8e22c1e561c
   6:     0x558cbfe32dce - std::panicking::rust_panic_with_hook::hb9ee0580a171cf05
   7:     0x558cbfe556ca - std::panicking::begin_panic_handler::{{closure}}::h6a91c522b14e319c
   8:     0x558cbfe55386 - std::sys_common::backtrace::__rust_end_short_backtrace::h0c1987780043709b
   9:     0x558cbfe32b20 - rust_begin_unwind
  10:     0x558cbfbe7765 - core::panicking::panic_fmt::haffa8b258e06944a
  11:     0x558cbfbe7972 - core::panicking::panic_bounds_check::h7ece44e9b2991f77
  12:     0x558cbfcc7804 - difft::display::side_by_side::print::h395062f7207a1a24
  13:     0x558cbfcebc10 - difft::print_diff_result::hf2d672fb3044ffd4
  14:     0x558cbfce76d5 - difft::main::h33c7d715c916808b
  15:     0x558cbfcd10c3 - std::sys_common::backtrace::__rust_begin_short_backtrace::h9f85ef8d11c91a26
  16:     0x558cbfd18649 - std::rt::lang_start::{{closure}}::hedbbc4005407dc2d
  17:     0x558cbfe39155 - std::rt::lang_start_internal::h18187536d7e524f6
  18:     0x558cbfcebff5 - main
  19:     0x7fc187c28150 - __libc_start_call_main
                               at ./csu/../sysdeps/nptl/libc_start_call_main.h:58:16
  20:     0x7fc187c28209 - __libc_start_main_impl
                               at ./csu/../csu/libc-start.c:360:3
  21:     0x558cbfc08d45 - _start
  22:                0x0 - <unknown>

image

My difft version:

$ difft --version
Difftastic 0.56.1 (built with rustc 1.76.0)

By the way, great tool! :muscle:

amavlyanov commented 6 months ago

Have similar error on the same difftastic version (using macOS 14.4.1 on M2 MacBook). My backtrace looks like:

thread 'main' panicked at src/display/side_by_side.rs:515:34:
index out of bounds: the len is 24 but the index is 24
stack backtrace:
   0:        0x102c470a0 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::hc44f794219052ac6
   1:        0x102acea18 - core::fmt::write::h3be8b1c9593e07b4
   2:        0x102c5d380 - std::io::Write::write_fmt::h95a1e6c5537a96b5
   3:        0x102c46eb8 - std::sys_common::backtrace::print::h7b809dd1f18ea674
   4:        0x102c5f2a8 - std::panicking::default_hook::{{closure}}::he339b0dd23f9052e
   5:        0x102c5efbc - std::panicking::default_hook::he9ee42fa25c512f5
   6:        0x102c5f758 - std::panicking::rust_panic_with_hook::hd4ada112570991e5
   7:        0x102c47724 - std::panicking::begin_panic_handler::{{closure}}::h3f2b6a80c2b7e617
   8:        0x102c472e4 - std::sys_common::backtrace::__rust_end_short_backtrace::hc0b94e2951df3fae
   9:        0x102c5f4fc - _rust_begin_unwind
  10:        0x103427a74 - core::panicking::panic_fmt::h0ed3af01737be6a9
  11:        0x103427bcc - core::panicking::panic_bounds_check::hd8547d2ac1a831c3
  12:        0x102b2e45c - difft::display::side_by_side::print::h828c049d1c24d463
  13:        0x102b252f0 - difft::print_diff_result::h698bc527a35012a9
  14:        0x102b21f28 - difft::main::h70547995380bd13e
  15:        0x102b3ac48 - std::sys_common::backtrace::__rust_begin_short_backtrace::he6ac8584ea647aa6
  16:        0x102b486c8 - std::rt::lang_start::{{closure}}::h73c9888aa910514a
  17:        0x102c5f3f8 - std::panicking::try::h3128bce4d588bc41
  18:        0x102c49408 - std::rt::lang_start_internal::h57a49fa80dd8b904
  19:        0x102b25650 - _main
fatal: external diff died, stopping at README.md
ofavre commented 6 months ago

FYI, the crash is still here with Difftastic 0.57.0 (built with rustc 1.77.1).

JonathanxD commented 6 months ago

I've just bisected the problem to this commit: 3be8e80fe73d10bc92944eb55d822d27a380c5f8.

To trigger this issue you only need to have enough words to cross the MAX_WORDS_IN_LINE. I was consistently getting this problem with some log files, so I replaced everything with : (which counts as word) and added them to a gist.

To reproduce the issue just use those files and run difft against them:

$ curl -o bad_left 'https://gist.githubusercontent.com/JonathanxD/450595f7a25cca43076ca5083ee73e33/raw/85ac535c67fb40c30006e5622b015c387c06cbe4/bad_left'
$ curl -o bad_right 'https://gist.githubusercontent.com/JonathanxD/450595f7a25cca43076ca5083ee73e33/raw/85ac535c67fb40c30006e5622b015c387c06cbe4/bad_right'
$ difft bad_left bad_right

And I just found out that the order does not really matter (you can swap left and right and it still panics).

After further investigation, and looking around the code a bit, I found out that this line should actually be:

index d54731d59..562d327d1 100644
--- a/src/line_parser.rs
+++ b/src/line_parser.rs
@@ -152,7 +152,7 @@ pub(crate) fn change_positions(lhs_src: &str, rhs_src: &str) -> Vec<MatchedPos>
                 // have a very large number of words, don't diff
                 // individual words.
                 if lhs_words.len() > MAX_WORDS_IN_LINE || rhs_words.len() > MAX_WORDS_IN_LINE {
-                    for lhs_pos in lhs_lp.from_region(lhs_offset, lhs_offset + lhs_part.len()) {
+                    for lhs_pos in lhs_lp.from_region(lhs_offset, lhs_offset + line_len_in_bytes(&lhs_part)) {
                         mps.push(MatchedPos {
                             kind: MatchKind::NovelWord {
                                 highlight: TokenKind::Atom(AtomKind::Normal),

I ran all the tests with this patch and every seems fine. The only thing that is unclear to me is why those calls use this function, but those don't: 1, 2. (it's because those are word-based, not line-based).

JonathanxD commented 2 months ago

Today I was updating difft and applying my patches and I thought: “Why not run a bisect against pretty much all of those cases again?” So, that's what I did.

Results

1st

The comment https://github.com/Wilfred/difftastic/issues/694#issuecomment-2066249883 talks about a yaml that triggers an out-of-bounds. This was the very first one that started failing during the bisect:

The bad commit is¹: ffd49d523a3ac2d3cfdec7dc463f8b20eb524c3a The last commit that it worked without panics was (which is just parent commit): f86ba13abfdc247c7963f2757917891bf05ca722 The diff: f86ba13abfdc247c7963f2757917891bf05ca722...ffd49d523a3ac2d3cfdec7dc463f8b20eb524c3a

Note that the only case that starts breaking at this point is the yaml one, and it stays that way during the entire bisect session.

2nd (already fixed)

The JQuery case (https://github.com/Wilfred/difftastic/issues/694#issue-2226773984) was broken for quite some time, but was fixed at some point:

The bad commit was: 1e7866b64ec083385a51affceb31a2571f9e3539 The last good commit: 2b9aca85f38624a31ca6f6a22285d0b710c9b70d The diff: 2b9aca85f38624a31ca6f6a22285d0b710c9b70d...1e7866b64ec083385a51affceb31a2571f9e3539

This one is not broken anymore and I didn't bisect the commit that actually fixed it.

3rd

And finally, those ones: https://github.com/Wilfred/difftastic/issues/688#issue-2212713540, https://github.com/Wilfred/difftastic/issues/681#issue-2206981128 and https://github.com/Wilfred/difftastic/issues/682#issue-2207630735.

The bad commit is: 53298e42404bd4b389e74597e6dba26518eb82b9 The last good commit: 7e8f928926cd46603d27f37bdfc38717fc52bb3d The diff: 7e8f928926cd46603d27f37bdfc38717fc52bb3d...53298e42404bd4b389e74597e6dba26518eb82b9


I hope this helps.

elldritch commented 2 months ago

I think @JonathanxD is right here. I'm running into this issue as well, and running difft under cargo rr shows that I can reliably trigger the bug when lhs_words.len() > 1000 with a breakpoint at line_parser.rs:155.

Here's how the crash happens:

  1. When calculating DiffResults in main, line_parser::change_positions is called in order to calculate the value for the lhs_positions field (diff_file calls diff_file_content calls line_parser::change_positions).
  2. Because of the bug in line_parser.rs:155, change_positions actually calculates one extra SingleLineSpan because it does not correctly strip the trailing newline at the end of all files before calling lhs_lp.from_region, and therefore one extra MatchedPos gets pushed to the result in line_parser.rs:156.
  3. When this is consumed downstream in difft::display::side_by_side::print (passed in as lhs_mps, which comes from the constructed DiffResult), it causes an off-by-one error in display/side_by_side.rs:492 because there is an extra lhs_line_num in aligned_lines, since aligned_lines is calculated using the lhs_mps from (2), but the actual lines to print in lhs_colored_lines is calculated using lhs_src, which does not contain an extra line.

I think the patch suggested in https://github.com/Wilfred/difftastic/issues/688#issuecomment-2036847084, and applying it does prevent the bug from occurring in my test case.

Wilfred commented 2 months ago

Hmm, the pull request fixes the bad_left/right case with colons, but not the go.sum.txt case.

Wilfred commented 2 months ago

OK, so AFAICS the underlying issue is that difftastic assumes that it can split a string on lines, join them later, and get the same string. This isn't true in .lines() in Rust's stdlib, which treats foo\nbar and foo\nbar\n the same. This used to have custom line splitting logic in difftastic, but now just uses the stdlib (changed in 8c004be87b5553809d1d865fc76adb754c2c3b55, see also cb9367c1298d253b4c2a0ec2536326abd388d043 and 8b842387a16e5ae4a79aaf6befb0bac1894882d0).

As a result, difftastic crashes when you have a large (>1000 words) change between two text files, where the change ends with a newline. There are probably other places in the code that rely on this invariant.