Open Mofiqul opened 2 years ago
+1, I can replicate when attempting to type at the bottom of the file.
Related open issue in neovim
https://github.com/nvim-treesitter/nvim-treesitter/issues/2972
Workaround For now add this commit
f95876f0ed3ef207bbd3c5c41987bc2e9cecfc97
to lockfile.json
in nvim-treesitter plugin. And TSInstall dart
The issue is in scanner.c
somewhere, reverting it to the version at f95876f0ed3ef207bbd3c5c41987bc2e9cecfc97
fixes the crash
I made a quick fork where I changed the version (feel free to use it until it's fixed) https://github.com/RobertBrunhage/nvim-treesitter
Just creating an empty dart file and adding one non-white-space character without a trailing semicolon is enough to cause the segmentation fault. If we write the semicolon first and then insert anything before it the parser won't crash.
As simple as this:
printf "p" > konstant.dart && nvim konstant.dart
=> [SIGSEV]
printf "p;" > konstant.dart && nvim konstant.dart
=> [works just fine]
Same issue here: Reverting to the older revision works :)
@UserNobody14 I've played around a bit with the scanner and the only way so far I have been able to prevent a segfault without a semi colon at the end is by changing the last return false;
in tree_sitter_dart_external_scanner_scan
to return lexer->lookahead == '\0'
.
This however really sounds like a bug in tree sitter itself we stumbled upon, or it could even be in the grammar.
@UserNobody14 I've played around a bit with the scanner and the only way so far I have been able to prevent a segfault without a semi colon at the end is by changing the last
return false;
intree_sitter_dart_external_scanner_scan
toreturn lexer->lookahead == '\0'
. This however really sounds like a bug in tree sitter itself we stumbled upon, or it could even be in the grammar.
Hey! I don't have idea of why, but it worked for me as well! @UserNobody14 is this a real fix or would it be just masking some other issue? Unfortunately the commit that introduced the bug is very big to allow attempts to fix it without a deep understanding of tree-sitter and Dart language specification.
Let me try adding that fix and running the grammar on a large list of files again, as far as I know it shouldn't cause a big issue. If it works, I'll merge it
Hmmm. After looking into the problem I can't seem to either replicate it or get the solution you posted to work with our tests. I don't have neovim setup with dart, so I tried running all of the tests I traditionally use, like parsing files and such. Sadly I was not able to recreate any of these issues, which I believe indicates it only happens on reruns of the grammer. Unfortunately without an easy way to replicate it I just don't know what I should do.
Near as I can tell the "lookahead == '\0'" line causes it to return true while the string has not yet ended, but this causes nearly all of our tests to fail, and on other projects' scanner.c files, the return false line is still present, so I don't see why it should be causing such an issue.
I found that this configuration below successfully runs the tests, but without an easy way to see if it segfaults, i have no way of knowing if it actually fixes the problem. Does anyone have some easy steps for reproducing the issue?
const bool *valid_symbols) {
// bool ret = false;
// if (lexer->lookahead == '/') {
// return scan_multiline_comments(lexer);
// }
if (lexer->lookahead == '\0') {
return false;
}
if (
valid_symbols[TEMPLATE_CHARS_DOUBLE] ||
valid_symbols[TEMPLATE_CHARS_SINGLE] ||
valid_symbols[TEMPLATE_CHARS_DOUBLE_SINGLE] ||
valid_symbols[TEMPLATE_CHARS_SINGLE_SINGLE]
) {
return scan_templates(lexer, valid_symbols);
}
if (valid_symbols[AUTOMATIC_SEMICOLON]) {
bool ret = scan_automatic_semicolon(lexer);
return ret;
}
while (iswspace(lexer->lookahead)) lexer->advance(lexer, true);
if (lexer->lookahead == '/') {
return scan_multiline_comments(lexer);
}
return false;
}```
You don't need a Neovim setup. Just let the tree-sitter parser run. Instructions https://github.com/ikatyang/tree-sitter-markdown/issues/14 (CC needs to be clang and python in the script python2). I've provided sample inputs that let the parser crash in the original issue, but I would recommend running the fuzzer (maybe even on CI)
If this is really a bug in tree-sitter, you could have a look at our open issue with tree-sitter-prisma which crashes without a custom parser @Kavantix . There is even a closed issue for tree-sitter-pug which hd the same issue that could be fixed by a check for \0
@theHamsta could you link the issues you mentioned?
@UserNobody14 I reproduced it by running tree-sitter highlight on a file containing a few random characters and no semicolon at the end. E.g. dhdh
. So no need for using vim while testing
@Kavantix here you go: https://github.com/nvim-treesitter/nvim-treesitter/issues/2972#issuecomment-1141491120 As you say, no need to use vim. I bisected using git rebase -exec '...'
using the setup in https://github.com/ikatyang/tree-sitter-markdown/issues/14 . There are some problems with the queries in this repo, so I had to modify the fuzzing main a bit to skip checking the queries. (basically skip all usages of https://github.com/tree-sitter/tree-sitter/blob/b729029a403046375a77a8e320211da64da00629/test/fuzz/fuzzer.cc#L7 in this file
I should really open a PR to document how to setup fuzzing in upstream tree-sitter. We found quite a lot parser issue with tree-sitter's fuzzing setup.
Is the current workaround is the one offered by @GenesisTMS , to revert to SHA f95876f0ed3ef207bbd3c5c41987bc2e9cecfc97 of this repo?
This is a manual step so all new users of Neovim + TS + Dart will have to perform it on their own. Would it be possible to revert the changes done on this repo to that SHA instead?
https://github.com/nvim-treesitter/nvim-treesitter/issues/2972 which was referenced earlier as the matching TS issue is now marked as closed.
Coming to this from nvim-treesitter.
Adding the following to test/corpus/dart.txt
replicates the issue with tree-sitter test
:
=====================
Weird file
=====================
d
---------------
This should at least help debugging the test.
The reason of the segfault is still a mistery to me, and the scanner.c
is cryptic to me, especially scan_templates
which does some kind of weird loop things.
Changing the scanner to always return false immediately gives the same crash, so it is not because of the scanner
Now that's interesting...
Any chance to revert back to the working commit until the problem is solved?
nvim-treesitter now uses an older working version for now: https://github.com/nvim-treesitter/nvim-treesitter/commit/d99b4cdd7591ab2e3ff3f962f7cc518908375805
@theHamsta a fuzzing setup guide would be tremendously appreciated, I'd love to fuzz my grammar.
@ahelwer Can't count how often I linked this post https://github.com/ikatyang/tree-sitter-markdown/issues/14#issue-725850717
@theHamsta maybe we could host a wiki page for this, although the comment is good, debugging the error is a bit more complicated than just running the fuzzer :)
Ideally such a documentation is contributed to tree-sitter upstream. On how to create safe, fuzzed parsers.
I guess so. I will look at this next week
Alright I got a commit working with @Kavantix 's highlight dhdh file as well as @vigoux 's "weird file" test (thanks btw!). Unfortunately it's still hard for me to be sure it won't crash again. Let me know if it keeps happening with the latest changes.
I'm not a genius user of scanner.c, sadly, I just kinda copied and pasted from various other tree sitter repos until I got it working. If anyone knowledgeable of C and tree-sitter wants to take a crack at it, I'd be happy to assist/merge your commits.
Seems to be ok now. Fuzzed for 55mins. Can the others please confirm that this is gone now also in Neovim?
I can confirm that this issue has been fixed on neovim, although it seems that the speed for the first load is back to normal
also, the flutter project does not crash now for me
For me the crash still happens on 53485a8f301254e19c518aa20c80f1bcf7cf5c62
yeap still happening :(
Can anyone provide a test or a specific file that crashes? I've tested it with empty files and files with a single variable but no dice.
This is for querying (it uses the highlight queries in this project repo). Let me check whether it also crashes for parsing.
=================================================================
==18396==ERROR: AddressSanitizer: global-buffer-overflow on address 0x55a1549d7674 at pc 0x55a1549141c2 bp 0x7fff81d95910 sp 0x7fff81d95908
READ of size 2 at 0x55a1549d7674 thread T0
#0 0x55a1549141c1 in ts_tree_cursor_current_status (/home/stephan/projects/tree-sitter/out/dart_fuzzer+0x4741c1) (BuildId: dc3168fc09ef0bd840db576e17453b5329cf40fe)
#1 0x55a154816478 in ts_query_cursor__advance query.c
#2 0x55a154811196 in ts_query_cursor_next_match (/home/stephan/projects/tree-sitter/out/dart_fuzzer+0x371196) (BuildId: dc3168fc09ef0bd840db576e17453b5329cf40fe)
#3 0x55a1546fecad in LLVMFuzzerTestOneInput (/home/stephan/projects/tree-sitter/out/dart_fuzzer+0x25ecad) (BuildId: dc3168fc09ef0bd840db576e17453b5329cf40fe)
#4 0x55a15460d250 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) (/home/stephan/projects/tree-sitter/out/dart_fuzzer+0x16d250) (BuildId: dc3168fc09ef0bd840db576e17453b5329cf40fe)
#5 0x55a15460c9b5 in fuzzer::Fuzzer::RunOne(unsigned char const*, unsigned long, bool, fuzzer::InputInfo*, bool, bool*) (/home/stephan/projects/tree-sitter/out/dart_fuzzer+0x16c9b5) (BuildId: dc3168fc09ef0bd840db576e17453b5329cf40fe)
#6 0x55a15460e1a5 in fuzzer::Fuzzer::MutateAndTestOne() (/home/stephan/projects/tree-sitter/out/dart_fuzzer+0x16e1a5) (BuildId: dc3168fc09ef0bd840db576e17453b5329cf40fe)
#7 0x55a15460eda5 in fuzzer::Fuzzer::Loop(std::vector<fuzzer::SizedFile, std::allocator<fuzzer::SizedFile>>&) (/home/stephan/projects/tree-sitter/out/dart_fuzzer+0x16eda5) (BuildId: dc3168fc09ef0bd840db576e17453b5329cf40fe)
#8 0x55a1545fcef0 in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) (/home/stephan/projects/tree-sitter/out/dart_fuzzer+0x15cef0) (BuildId: dc3168fc09ef0bd840db576e17453b5329cf40fe)
#9 0x55a154626412 in main (/home/stephan/projects/tree-sitter/out/dart_fuzzer+0x186412) (BuildId: dc3168fc09ef0bd840db576e17453b5329cf40fe)
#10 0x7f3b6302350f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
#11 0x7f3b630235c8 in __libc_start_main csu/../csu/libc-start.c:381:3
#12 0x55a1545f1ae4 in _start (/home/stephan/projects/tree-sitter/out/dart_fuzzer+0x151ae4) (BuildId: dc3168fc09ef0bd840db576e17453b5329cf40fe)
0x55a1549d7674 is located 0 bytes after global variable 'ts_field_map_entries' defined in '/home/stephan/projects/tree-sitter/test/fixtures/grammars/dart/src/parser.c:3390' (0x55a1549d7360) of size 788
SUMMARY: AddressSanitizer: global-buffer-overflow (/home/stephan/projects/tree-sitter/out/dart_fuzzer+0x4741c1) (BuildId: dc3168fc09ef0bd840db576e17453b5329cf40fe) in ts_tree_cursor_current_status
Shadow bytes around the buggy address:
0x55a1549d7380: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x55a1549d7400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x55a1549d7480: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x55a1549d7500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x55a1549d7580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x55a1549d7600: 00 00 00 00 00 00 00 00 00 00 00 00 00 00[04]f9
0x55a1549d7680: f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9
0x55a1549d7700: f9 f9 f9 f9 f9 f9 f9 f9 00 00 00 00 00 00 00 00
0x55a1549d7780: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x55a1549d7800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x55a1549d7880: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
==18396==ABORTING
MS: 2 CrossOver-InsertByte-; base unit: 5c10b5b2cd673a0616d529aa5234b12ee7153
[crash-6b5a6e5f767759c97ee0cb4ff08b35c32f40b808.txt](https://github.com/UserNobody14/tree-sitter-dart/files/9768311/crash-6b5a6e5f767759c97ee0cb4ff08b35c32f40b808.txt)
808
0xac,0x2c,
\254,
artifact_prefix='./'; Test unit written to ./crash-6b5a6e5f767759c97ee0cb4ff08b35c32f40b808
Base64: rCw=
@maxbrunsfeld do you have an idea why the fuzzer could be crashing when querying but not on parsing?
It does not seem to crash when just using the fuzzer to parse. Just when the fuzzer script also tries to query using highlight.scm
If you can find a minimal query and source file that reproduces the error (via the tree-sitter query
CLI command), could you open a Tree-sitter issue?
It just happened to me
I'm assuming people are still seeing this. I am working on updating this library for newer dart syntax, and have been trying to clean up the grammar slightly as I go. However, it seems like the issue stems not from the grammar but rather from the highlights file. The highlights currently only have a few test cases. I'll see what I can do to get a more comprehensive testing setup there, which will help diagnose this issue. In the meantime if someone wants to try the latest commits in this repo and determine whether the issue is no longer present that would be great. (I do not personally use NeoVim or the highlights currently). Additionally anyone who would like to contribute to the test cases for highlighting I'd be happy to review PRs.
CC @theHamsta
@TimWhiting Can confirm, I am using the latest commit and still experiencing crashes when using with neovim and dartls.
Crashing neovim completely, Segmentation Fault (Core dumped).