afnanenayet / diffsitter

A tree-sitter based AST difftool to get meaningful semantic diffs
MIT License
1.59k stars 29 forks source link

[BUG] core dump comparing two C files #763

Closed 0-wiz-0 closed 7 months ago

0-wiz-0 commented 9 months ago

Describe the bug I wanted to try out diffsitter, so I tried diffing two C files with diffsitter 0.8.1 built from pkgsrc on NetBSD/amd64, but it dumped core.

To Reproduce Check out https://github.com/nih-at/libzip/ Run

diffsitter lib/zip_source_winzip_aes_*
zsh: segmentation fault (core dumped)  diffsitter lib/zip_source_winzip_aes_*```

Expected behavior I wanted to see a diff instead.

Log output/screenshots

wiz@exadelic:~/Projects/nih/libzip> diffsitter --debug lib/zip_source_winzip_aes_*
 2023-11-17T21:59:31.121Z DEBUG diffsitter > Checking if lib/zip_source_winzip_aes_decode.c can be parsed
 2023-11-17T21:59:31.122Z INFO  libdiffsitter::parse > Deduced language "c" from extension "c" from default mappings
 2023-11-17T21:59:31.122Z INFO  libdiffsitter::parse > Loading dynamic library from libtree-sitter-c.so
 2023-11-17T21:59:31.122Z DEBUG libdiffsitter::parse > Using name tree_sitter_c for dynamic function
 2023-11-17T21:59:31.123Z INFO  libdiffsitter::parse > Succeeded loading grammar for c
 2023-11-17T21:59:31.123Z DEBUG diffsitter           > Checking if lib/zip_source_winzip_aes_encode.c can be parsed
 2023-11-17T21:59:31.123Z INFO  libdiffsitter::parse > Deduced language "c" from extension "c" from default mappings
 2023-11-17T21:59:31.123Z INFO  libdiffsitter::parse > Loading dynamic library from libtree-sitter-c.so
 2023-11-17T21:59:31.123Z DEBUG libdiffsitter::parse > Using name tree_sitter_c for dynamic function
 2023-11-17T21:59:31.123Z INFO  libdiffsitter::parse > Succeeded loading grammar for c
 2023-11-17T21:59:31.123Z DEBUG diffsitter           > Extensions for both input files are supported
 2023-11-17T21:59:31.123Z DEBUG libdiffsitter        > Reading lib/zip_source_winzip_aes_decode.c to string
 2023-11-17T21:59:31.123Z INFO  libdiffsitter        > Will deduce filetype from file extension
 2023-11-17T21:59:31.123Z INFO  libdiffsitter::parse > Deduced language "c" from extension "c" from default mappings
 2023-11-17T21:59:31.123Z INFO  libdiffsitter::parse > Loading dynamic library from libtree-sitter-c.so
 2023-11-17T21:59:31.123Z DEBUG libdiffsitter::parse > Using name tree_sitter_c for dynamic function
 2023-11-17T21:59:31.124Z INFO  libdiffsitter::parse > Succeeded loading grammar for c
zsh: segmentation fault (core dumped)  diffsitter --debug lib/zip_source_winzip_aes_*

Here's the backtrace:

(gdb) bt
#0  ts_language_version (self=0x74406e1fed20) at src/./language.c:11
#1  0x00000000009eb2f4 in tree_sitter::Parser::set_language::h2cb43d5a19f629d1 ()
#2  0x00000000009c551f in libdiffsitter::parse::parse_file::h673ec514b7a95acb ()
#3  0x00000000009d4c18 in libdiffsitter::generate_ast_vector_data::hee8e02d6202e3ba7 ()
#4  0x000000000099a07f in diffsitter::main::h80217524e615fc00 ()
#5  0x00000000009a78c3 in std::sys_common::backtrace::__rust_begin_short_backtrace::h71ea8733c7c5e234 ()
#6  0x00000000009abddd in std::rt::lang_start::{{closure}}::hffa38383312f03fc ()
#7  0x0000000000afdf87 in std::panicking::try::h0d6cfe1f8828421a ()
#8  0x0000000000ada19b in std::rt::lang_start_internal::h927544304a5690c7 ()
#9  0x000000000099b675 in main ()

Platform: NetBSD-10.99.10/amd64

Additional context I do have tree-sitter-c 0.20.5 installed in my default search path. I didn't configure diffsitter in any way to look for it though.

afnanenayet commented 9 months ago

Well this is a first! I've never seen diffsitter segfault before. Appreciate the detailed bug report

afnanenayet commented 8 months ago

Hey this should be handled by #783. Would you mind trying out the nightly that gets released tonight to see if you get a better error message rather than a segfault? Based on the stacktrace I suspect the issue might be due to a tree-sitter ABI incompatibility between diffsitter and the grammar being loaded from the shared library

0-wiz-0 commented 8 months ago

There sadly is no NetBSD nightly binary; but I built one myself, and have to report that the core dump is still there:

Program received signal SIGSEGV, Segmentation fault.
ts_language_version (self=0x7755964cad20) at src/./language.c:11
11      src/./language.c: No such file or directory.
(gdb) bt
#0  ts_language_version (self=0x7755964cad20) at src/./language.c:11
#1  0x0000000000414a7a in tree_sitter::Language::version::h67be3fd5209e76a7 ()
#2  0x00000000003edb90 in libdiffsitter::parse::ts_parser_for_language::h895546d8c18a2441 ()
#3  0x00000000003ee091 in libdiffsitter::parse::parse_file::hf26d7de1eb370d23 ()
#4  0x00000000003f53aa in libdiffsitter::generate_ast_vector_data::h642771b6eca102bf ()
#5  0x00000000003c1fe0 in diffsitter::main::hacbc4337717a7241 ()
#6  0x00000000003d5773 in std::sys_common::backtrace::__rust_begin_short_backtrace::h0a39bdcddde6b702 ()
#7  0x00000000003d7bcd in std::rt::lang_start::{{closure}}::h265f4f89e813b193 ()
#8  0x000000000052d4d7 in std::panicking::try::h0d6cfe1f8828421a ()
#9  0x00000000005094eb in std::rt::lang_start_internal::h927544304a5690c7 ()
0-wiz-0 commented 8 months ago

Here's the --debug output for that run:

Starting program: /usr/pkg/bin/diffsitter --debug lib/zip_source_winzip_aes_*
 2023-12-24T16:36:09.333Z DEBUG diffsitter > Checking if lib/zip_source_winzip_aes_decode.c can be parsed
 2023-12-24T16:36:09.333Z INFO  libdiffsitter::parse > Deduced language "c" from extension "c" from default mappings
 2023-12-24T16:36:09.333Z DEBUG diffsitter           > Deduced language c for path lib/zip_source_winzip_aes_decode.c
 2023-12-24T16:36:09.333Z DEBUG diffsitter           > Checking if lib/zip_source_winzip_aes_encode.c can be parsed
 2023-12-24T16:36:09.333Z INFO  libdiffsitter::parse > Deduced language "c" from extension "c" from default mappings
 2023-12-24T16:36:09.333Z DEBUG diffsitter           > Deduced language c for path lib/zip_source_winzip_aes_encode.c
 2023-12-24T16:36:09.333Z DEBUG libdiffsitter        > Reading lib/zip_source_winzip_aes_decode.c to string
 2023-12-24T16:36:09.333Z INFO  libdiffsitter        > Will deduce filetype from file extension
 2023-12-24T16:36:09.333Z INFO  libdiffsitter::parse > Deduced language "c" from extension "c" from default mappings
 2023-12-24T16:36:09.333Z INFO  libdiffsitter::parse > Loading dynamic library from libtree-sitter-c.so
 2023-12-24T16:36:09.333Z DEBUG libdiffsitter::parse > Using name tree_sitter_c for dynamic function
 2023-12-24T16:36:09.380Z INFO  libdiffsitter::parse > Succeeded loading grammar for c

I have tree-sitter-c-0.20.5 installed.

afnanenayet commented 8 months ago

Thanks for trying that out! I'll keep digging

afnanenayet commented 7 months ago

I was able to replicate the segfault and tried fixing with #786, do you still see the error with the latest nightly?

0-wiz-0 commented 7 months ago

I can't reproduce the core dump any longer with diffsitter compiled from git HEAD. Thank you!

afnanenayet commented 7 months ago

I appreciate the patience!