UserNobody14 / tree-sitter-dart

Attempt to make a tree-sitter grammar for dart
MIT License
57 stars 33 forks source link

Segmentation Fault and crashed neovim #29

Open Mofiqul opened 2 years ago

Mofiqul commented 2 years ago

Crashing neovim completely, Segmentation Fault (Core dumped).

rlch commented 2 years ago

+1, I can replicate when attempting to type at the bottom of the file.

genesistms commented 2 years ago

Related open issue in neovim

https://github.com/nvim-treesitter/nvim-treesitter/issues/2972

Workaround For now add this commit

f95876f0ed3ef207bbd3c5c41987bc2e9cecfc97

to lockfile.json in nvim-treesitter plugin. And TSInstall dart

Kavantix commented 2 years ago

The issue is in scanner.c somewhere, reverting it to the version at f95876f0ed3ef207bbd3c5c41987bc2e9cecfc97 fixes the crash

RobertBrunhage commented 2 years ago

I made a quick fork where I changed the version (feel free to use it until it's fixed) https://github.com/RobertBrunhage/nvim-treesitter

CarlosNihelton commented 2 years ago

Just creating an empty dart file and adding one non-white-space character without a trailing semicolon is enough to cause the segmentation fault. If we write the semicolon first and then insert anything before it the parser won't crash.

As simple as this:

printf "p" > konstant.dart && nvim konstant.dart => [SIGSEV]

printf "p;" > konstant.dart && nvim konstant.dart => [works just fine]

ErikReider commented 2 years ago

Same issue here: Reverting to the older revision works :)

Kavantix commented 2 years ago

@UserNobody14 I've played around a bit with the scanner and the only way so far I have been able to prevent a segfault without a semi colon at the end is by changing the last return false; in tree_sitter_dart_external_scanner_scan to return lexer->lookahead == '\0'. This however really sounds like a bug in tree sitter itself we stumbled upon, or it could even be in the grammar.

CarlosNihelton commented 2 years ago

@UserNobody14 I've played around a bit with the scanner and the only way so far I have been able to prevent a segfault without a semi colon at the end is by changing the last return false; in tree_sitter_dart_external_scanner_scan to return lexer->lookahead == '\0'. This however really sounds like a bug in tree sitter itself we stumbled upon, or it could even be in the grammar.

Hey! I don't have idea of why, but it worked for me as well! @UserNobody14 is this a real fix or would it be just masking some other issue? Unfortunately the commit that introduced the bug is very big to allow attempts to fix it without a deep understanding of tree-sitter and Dart language specification.

UserNobody14 commented 2 years ago

Let me try adding that fix and running the grammar on a large list of files again, as far as I know it shouldn't cause a big issue. If it works, I'll merge it

UserNobody14 commented 2 years ago

Hmmm. After looking into the problem I can't seem to either replicate it or get the solution you posted to work with our tests. I don't have neovim setup with dart, so I tried running all of the tests I traditionally use, like parsing files and such. Sadly I was not able to recreate any of these issues, which I believe indicates it only happens on reruns of the grammer. Unfortunately without an easy way to replicate it I just don't know what I should do.

Near as I can tell the "lookahead == '\0'" line causes it to return true while the string has not yet ended, but this causes nearly all of our tests to fail, and on other projects' scanner.c files, the return false line is still present, so I don't see why it should be causing such an issue.

I found that this configuration below successfully runs the tests, but without an easy way to see if it segfaults, i have no way of knowing if it actually fixes the problem. Does anyone have some easy steps for reproducing the issue?


                                                  const bool *valid_symbols) {
  // bool ret = false;
  // if (lexer->lookahead == '/') {
  //   return scan_multiline_comments(lexer);
  // }
  if (lexer->lookahead == '\0') {
    return false;
  }
  if (
      valid_symbols[TEMPLATE_CHARS_DOUBLE] ||
      valid_symbols[TEMPLATE_CHARS_SINGLE] ||
      valid_symbols[TEMPLATE_CHARS_DOUBLE_SINGLE] ||
      valid_symbols[TEMPLATE_CHARS_SINGLE_SINGLE]
  ) {
    return scan_templates(lexer, valid_symbols);
  }
  if (valid_symbols[AUTOMATIC_SEMICOLON]) {
      bool ret = scan_automatic_semicolon(lexer);
      return ret;
  }
  while (iswspace(lexer->lookahead)) lexer->advance(lexer, true);

  if (lexer->lookahead == '/') {
    return scan_multiline_comments(lexer);
  }
  return false;
}```
theHamsta commented 2 years ago

You don't need a Neovim setup. Just let the tree-sitter parser run. Instructions https://github.com/ikatyang/tree-sitter-markdown/issues/14 (CC needs to be clang and python in the script python2). I've provided sample inputs that let the parser crash in the original issue, but I would recommend running the fuzzer (maybe even on CI)

theHamsta commented 2 years ago

If this is really a bug in tree-sitter, you could have a look at our open issue with tree-sitter-prisma which crashes without a custom parser @Kavantix . There is even a closed issue for tree-sitter-pug which hd the same issue that could be fixed by a check for \0

Kavantix commented 2 years ago

@theHamsta could you link the issues you mentioned?

@UserNobody14 I reproduced it by running tree-sitter highlight on a file containing a few random characters and no semicolon at the end. E.g. dhdh. So no need for using vim while testing

theHamsta commented 2 years ago

@Kavantix here you go: https://github.com/nvim-treesitter/nvim-treesitter/issues/2972#issuecomment-1141491120 As you say, no need to use vim. I bisected using git rebase -exec '...' using the setup in https://github.com/ikatyang/tree-sitter-markdown/issues/14 . There are some problems with the queries in this repo, so I had to modify the fuzzing main a bit to skip checking the queries. (basically skip all usages of https://github.com/tree-sitter/tree-sitter/blob/b729029a403046375a77a8e320211da64da00629/test/fuzz/fuzzer.cc#L7 in this file

I should really open a PR to document how to setup fuzzing in upstream tree-sitter. We found quite a lot parser issue with tree-sitter's fuzzing setup.

ghost commented 1 year ago

Is the current workaround is the one offered by @GenesisTMS , to revert to SHA f95876f0ed3ef207bbd3c5c41987bc2e9cecfc97 of this repo?

This is a manual step so all new users of Neovim + TS + Dart will have to perform it on their own. Would it be possible to revert the changes done on this repo to that SHA instead?

https://github.com/nvim-treesitter/nvim-treesitter/issues/2972 which was referenced earlier as the matching TS issue is now marked as closed.

vigoux commented 1 year ago

Coming to this from nvim-treesitter.

Adding the following to test/corpus/dart.txt replicates the issue with tree-sitter test:

=====================
Weird file
=====================

d

---------------

This should at least help debugging the test.

The reason of the segfault is still a mistery to me, and the scanner.c is cryptic to me, especially scan_templates which does some kind of weird loop things.

Kavantix commented 1 year ago

Changing the scanner to always return false immediately gives the same crash, so it is not because of the scanner

vigoux commented 1 year ago

Now that's interesting...

rphuber commented 1 year ago

Any chance to revert back to the working commit until the problem is solved?

theHamsta commented 1 year ago

nvim-treesitter now uses an older working version for now: https://github.com/nvim-treesitter/nvim-treesitter/commit/d99b4cdd7591ab2e3ff3f962f7cc518908375805

ahelwer commented 1 year ago

@theHamsta a fuzzing setup guide would be tremendously appreciated, I'd love to fuzz my grammar.

theHamsta commented 1 year ago

@ahelwer Can't count how often I linked this post https://github.com/ikatyang/tree-sitter-markdown/issues/14#issue-725850717

kyazdani42 commented 1 year ago

@theHamsta maybe we could host a wiki page for this, although the comment is good, debugging the error is a bit more complicated than just running the fuzzer :)

theHamsta commented 1 year ago

Ideally such a documentation is contributed to tree-sitter upstream. On how to create safe, fuzzed parsers.

kyazdani42 commented 1 year ago

I guess so. I will look at this next week

UserNobody14 commented 1 year ago

Alright I got a commit working with @Kavantix 's highlight dhdh file as well as @vigoux 's "weird file" test (thanks btw!). Unfortunately it's still hard for me to be sure it won't crash again. Let me know if it keeps happening with the latest changes.

I'm not a genius user of scanner.c, sadly, I just kinda copied and pasted from various other tree sitter repos until I got it working. If anyone knowledgeable of C and tree-sitter wants to take a crack at it, I'd be happy to assist/merge your commits.

theHamsta commented 1 year ago

Seems to be ok now. Fuzzed for 55mins. Can the others please confirm that this is gone now also in Neovim?

fitrh commented 1 year ago

I can confirm that this issue has been fixed on neovim, although it seems that the speed for the first load is back to normal

ayoubelmhamdi commented 1 year ago

also, the flutter project does not crash now for me

Kavantix commented 1 year ago

For me the crash still happens on 53485a8f301254e19c518aa20c80f1bcf7cf5c62

johnstef99 commented 1 year ago

yeap still happening :(

UserNobody14 commented 1 year ago

Can anyone provide a test or a specific file that crashes? I've tested it with empty files and files with a single variable but no dice.

theHamsta commented 1 year ago

This is for querying (it uses the highlight queries in this project repo). Let me check whether it also crashes for parsing.

=================================================================
==18396==ERROR: AddressSanitizer: global-buffer-overflow on address 0x55a1549d7674 at pc 0x55a1549141c2 bp 0x7fff81d95910 sp 0x7fff81d95908
READ of size 2 at 0x55a1549d7674 thread T0
    #0 0x55a1549141c1 in ts_tree_cursor_current_status (/home/stephan/projects/tree-sitter/out/dart_fuzzer+0x4741c1) (BuildId: dc3168fc09ef0bd840db576e17453b5329cf40fe)
    #1 0x55a154816478 in ts_query_cursor__advance query.c
    #2 0x55a154811196 in ts_query_cursor_next_match (/home/stephan/projects/tree-sitter/out/dart_fuzzer+0x371196) (BuildId: dc3168fc09ef0bd840db576e17453b5329cf40fe)
    #3 0x55a1546fecad in LLVMFuzzerTestOneInput (/home/stephan/projects/tree-sitter/out/dart_fuzzer+0x25ecad) (BuildId: dc3168fc09ef0bd840db576e17453b5329cf40fe)
    #4 0x55a15460d250 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) (/home/stephan/projects/tree-sitter/out/dart_fuzzer+0x16d250) (BuildId: dc3168fc09ef0bd840db576e17453b5329cf40fe)
    #5 0x55a15460c9b5 in fuzzer::Fuzzer::RunOne(unsigned char const*, unsigned long, bool, fuzzer::InputInfo*, bool, bool*) (/home/stephan/projects/tree-sitter/out/dart_fuzzer+0x16c9b5) (BuildId: dc3168fc09ef0bd840db576e17453b5329cf40fe)
    #6 0x55a15460e1a5 in fuzzer::Fuzzer::MutateAndTestOne() (/home/stephan/projects/tree-sitter/out/dart_fuzzer+0x16e1a5) (BuildId: dc3168fc09ef0bd840db576e17453b5329cf40fe)
    #7 0x55a15460eda5 in fuzzer::Fuzzer::Loop(std::vector<fuzzer::SizedFile, std::allocator<fuzzer::SizedFile>>&) (/home/stephan/projects/tree-sitter/out/dart_fuzzer+0x16eda5) (BuildId: dc3168fc09ef0bd840db576e17453b5329cf40fe)
    #8 0x55a1545fcef0 in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) (/home/stephan/projects/tree-sitter/out/dart_fuzzer+0x15cef0) (BuildId: dc3168fc09ef0bd840db576e17453b5329cf40fe)
    #9 0x55a154626412 in main (/home/stephan/projects/tree-sitter/out/dart_fuzzer+0x186412) (BuildId: dc3168fc09ef0bd840db576e17453b5329cf40fe)
    #10 0x7f3b6302350f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
    #11 0x7f3b630235c8 in __libc_start_main csu/../csu/libc-start.c:381:3
    #12 0x55a1545f1ae4 in _start (/home/stephan/projects/tree-sitter/out/dart_fuzzer+0x151ae4) (BuildId: dc3168fc09ef0bd840db576e17453b5329cf40fe)

0x55a1549d7674 is located 0 bytes after global variable 'ts_field_map_entries' defined in '/home/stephan/projects/tree-sitter/test/fixtures/grammars/dart/src/parser.c:3390' (0x55a1549d7360) of size 788
SUMMARY: AddressSanitizer: global-buffer-overflow (/home/stephan/projects/tree-sitter/out/dart_fuzzer+0x4741c1) (BuildId: dc3168fc09ef0bd840db576e17453b5329cf40fe) in ts_tree_cursor_current_status
Shadow bytes around the buggy address:
  0x55a1549d7380: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x55a1549d7400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x55a1549d7480: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x55a1549d7500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x55a1549d7580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x55a1549d7600: 00 00 00 00 00 00 00 00 00 00 00 00 00 00[04]f9
  0x55a1549d7680: f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9
  0x55a1549d7700: f9 f9 f9 f9 f9 f9 f9 f9 00 00 00 00 00 00 00 00
  0x55a1549d7780: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x55a1549d7800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x55a1549d7880: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==18396==ABORTING
MS: 2 CrossOver-InsertByte-; base unit: 5c10b5b2cd673a0616d529aa5234b12ee7153
[crash-6b5a6e5f767759c97ee0cb4ff08b35c32f40b808.txt](https://github.com/UserNobody14/tree-sitter-dart/files/9768311/crash-6b5a6e5f767759c97ee0cb4ff08b35c32f40b808.txt)
808
0xac,0x2c,
\254,
artifact_prefix='./'; Test unit written to ./crash-6b5a6e5f767759c97ee0cb4ff08b35c32f40b808
Base64: rCw=

@maxbrunsfeld do you have an idea why the fuzzer could be crashing when querying but not on parsing?

theHamsta commented 1 year ago

crash-6b5a6e5f767759c97ee0cb4ff08b35c32f40b808.txt

theHamsta commented 1 year ago

It does not seem to crash when just using the fuzzer to parse. Just when the fuzzer script also tries to query using highlight.scm

maxbrunsfeld commented 1 year ago

If you can find a minimal query and source file that reproduces the error (via the tree-sitter query CLI command), could you open a Tree-sitter issue?

removed-usr commented 1 year ago

It just happened to me

TimWhiting commented 1 year ago

I'm assuming people are still seeing this. I am working on updating this library for newer dart syntax, and have been trying to clean up the grammar slightly as I go. However, it seems like the issue stems not from the grammar but rather from the highlights file. The highlights currently only have a few test cases. I'll see what I can do to get a more comprehensive testing setup there, which will help diagnose this issue. In the meantime if someone wants to try the latest commits in this repo and determine whether the issue is no longer present that would be great. (I do not personally use NeoVim or the highlights currently). Additionally anyone who would like to contribute to the test cases for highlighting I'd be happy to review PRs.

CC @theHamsta

mjcurciojr commented 9 months ago

@TimWhiting Can confirm, I am using the latest commit and still experiencing crashes when using with neovim and dartls.