Closed kirillrdy closed 1 year ago
Hi, thanks for reporting. I think the best way is, that you try to run it locally. Install the tree-sitter-cli (either via npm or cargo), clone this repo and run tree-sitter generate
and then tree-sitter parse structure.sql
We have extended some regex patterns for capturing numbers, but I am not sure if this is the root cause. Would be nice if you could try to generate a simple SQL statement where this seg fault is triggered.
The scanner was added in that range, I suspect that it could be the root cause.
@kirillrdy would that file happen to use dollar quoted strings? It would be helpful if you could reproduce it with a similar SQL snippet that is shareable.
The scanner was added before https://github.com/DerekStride/tree-sitter-sql/commit/44e36038f032eff2f36522642a8cc4351e133f83 But I also suspect that is has something to do with the scanner.c
hi, running tree-sitter parse structure.sql
doesn't cause segfault. I need to reproduce inside nixpkgs infrastructure, I believe they apply hardening flags eg fortify3
using state of art binary search, here is minimal snipplet that causes segfault ( within nixpkgs), but doesn't segfault run tree-sitter parse
CREATE FUNCTION public.foo() RETURNS trigger
LANGUAGE plpgsql
AS $$
BEGIN
END;
$$;
the issue is as $$
, when I change it to as bar
it works
Hmm interesting. For me your snippet does not through a segfault, but it is also not parsed correctly.
Same for the one reported in https://github.com/nvim-treesitter/nvim-treesitter/issues/5646
I have little to no experience with C, but I have run scanner.c
through static code analysis (cppcheck) and it reports the following:
src/scanner.c:29:3: error: Found an exit path from function with non-void return type that has missing return statement [missingReturn]
}
^
src/scanner.c:63:5: error: Memory leak: text_size [memleak]
return NULL;
^
Active checkers: 40/565
The second error is in the function scan_dollar_string_tag()
. @DerekStride @dmfay any ideas how to fix that?
[EDIT]: maybe a free(text_size);
in line 63 before the return NULL;
is enough?
Here is patch:
diff --git a/src/scanner.c b/src/scanner.c
index 68807eb..885b613 100644
--- a/src/scanner.c
+++ b/src/scanner.c
@@ -28,6 +28,7 @@ void *tree_sitter_sql_external_scanner_destroy(void *payload) {
state->start_tag = NULL;
}
free(payload);
+ return NULL;
}
char* add_char(char* text, int* text_size, char c, int index) {
@@ -60,6 +61,7 @@ char* scan_dollar_string_tag(TSLexer *lexer) {
tag = add_char(tag, text_size, '$', index);
lexer->advance(lexer, false);
} else {
+ free(text_size);
return NULL;
}
Would be nice if someone could check if this works. I cannot reproduce the segfault on my machine.
I can reproduce it locally and it seems to be related to fortify3
$ rust-gdb tree-sitter
Reading symbols from tree-sitter...
Reading symbols from /usr/lib/debug//usr/bin/tree-sitter.debug...
(gdb) run parse db_create.sql
Starting program: /usr/bin/tree-sitter parse db_create.sql
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New Thread 0x7ffff7d826c0 (LWP 11265)]
*** buffer overflow detected ***: terminated
Thread 1 "tree-sitter" received signal SIGABRT, Aborted.
0x00007ffff7e0e5fc in ?? () from /lib64/libc.so.6
(gdb) bt
#0 0x00007ffff7e0e5fc in ?? () from /lib64/libc.so.6
#1 0x00007ffff7dbf516 in raise () from /lib64/libc.so.6
#2 0x00007ffff7da8877 in abort () from /lib64/libc.so.6
#3 0x00007ffff7da9804 in ?? () from /lib64/libc.so.6
#4 0x00007ffff7e9fc1b in __fortify_fail () from /lib64/libc.so.6
#5 0x00007ffff7e9e3f6 in __chk_fail () from /lib64/libc.so.6
#6 0x00007ffff7e9dc66 in __strcpy_chk () from /lib64/libc.so.6
#7 0x00007ffff7f9555b in ts_parser_parse () from /usr/lib64/libtree-sitter.so.0
#8 0x00005555558f1503 in tree_sitter::Parser::parse_with<&[u8], tree_sitter::{impl#2}::parse::{closure_env#0}<&alloc::vec::Vec<u8, alloc::alloc::Global>>> (self=0x7ffffffdc7a8,
callback=0x7ffffffdbfc0, old_tree=...) at lib/binding_rust/lib.rs:537
#9 0x00005555558f17ce in tree_sitter::Parser::parse<&alloc::vec::Vec<u8, alloc::alloc::Global>> (self=0x7ffffffdc7a8, text=0x7ffffffdc7c0, old_tree=...) at lib/binding_rust/lib.rs:465
#10 0x00005555558723a5 in tree_sitter_cli::parse::parse_file_at_path (language=..., path=..., edits=0x7fffffffc400, max_path_length=13, output=tree_sitter_cli::parse::ParseOutput::Normal,
print_time=false, timeout=0, debug=false, debug_graph=false, cancellation_flag=...) at cli/src/parse.rs:81
#11 0x000055555561cd9c in tree_sitter::run () at cli/src/main.rs:434
#12 0x000055555561711c in tree_sitter::main () at cli/src/main.rs:20
for your reference parsed this file:
DO
$$
BEGIN
IF NOT EXISTS (
SELECT
FROM pg_catalog.pg_roles
WHERE rolname = 'store_user') THEN
CREATE ROLE store_user WITH LOGIN;
END IF;
END
$$;
@mortezadadgar does my patch fix the issue?
No your patch is unrelated to this issue, according to gdb it's related to one of strcpy
sadly it's doesn't tell which line is causing overflow
btw do you know why make generate
takes forever to be done? it's really hard to find the source of issue this way
@mortezadadgar @kirillrdy I was able to reproduce it. Seems like using -D_FORTIFY_SOURCE=2
works. If am reading the nix cc wrapper scripts correctly, you can compile it with fortify
instead of fortify3
.
https://github.com/NixOS/nixpkgs/blob/master/pkgs/build-support/cc-wrapper/add-hardening.sh
On the flip side, I would really like to know whats causing this. My patch however has no effect.
make generate
takes a long time because this parser has a really high state count.
I investigated around this issue a little bit, i'm quit sure about it but it might be tree-sitter issue as it turns out returning '$$'
if we add a return "$$"
before this line:
https://github.com/DerekStride/tree-sitter-sql/blob/b24e6759c0c05875b4929cc868bb6494ff797b83/src/scanner.c#L71
buffer overflow can be reproduced with fortify3 without involvement of any strcpy usage.
and gdb references the overflow from tree-sitter itself so the chances that it's a not related to scanner is pretty high
I took a look at the add_char
function and there are some bounds checks missing. I have tried to add them but it does not solve the issue:
diff --git a/src/scanner.c b/src/scanner.c
index 68807eb..8237ddb 100644
--- a/src/scanner.c
+++ b/src/scanner.c
@@ -30,18 +30,17 @@ void *tree_sitter_sql_external_scanner_destroy(void *payload) {
free(payload);
}
-char* add_char(char* text, int* text_size, char c, int index) {
+char* add_char(char* text, size_t* text_size, char c, int index) {
if (text == NULL) {
text = malloc(sizeof(char) * MALLOC_STRING_SIZE);
*text_size = MALLOC_STRING_SIZE;
- text[index] = c;
- text[index + 1] = '\0';
}
- if (index + 1 == *text_size) {
+ // will break when indexes advances more than MALLOC_STRING_SIZE
+ if (index + 1 >= *text_size) {
*text_size += MALLOC_STRING_SIZE;
char* tmp = malloc(*text_size * sizeof(char));
- strcpy(tmp, text);
+ strncpy(tmp, text, *text_size);
free(text);
text = tmp;
}
@@ -54,12 +53,13 @@ char* add_char(char* text, int* text_size, char c, int index) {
char* scan_dollar_string_tag(TSLexer *lexer) {
char* tag = NULL;
int index = 0;
- int* text_size = malloc(sizeof(int));
+ size_t* text_size = malloc(sizeof(size_t));
*text_size = 0;
if (lexer->lookahead == '$') {
tag = add_char(tag, text_size, '$', index);
lexer->advance(lexer, false);
} else {
+ free(text_size);
return NULL;
}
I suspect that the compiler adds code around these unchecked code parts and they crash when -D_FORTIFY_SOURCE=3
is set.
@kirillrdy @mortezadadgar can you check the code changes I submitted in #226 On my machine, this does not segfault anymore.
Hi
somewhere between 44e36038f032eff2f36522642a8cc4351e133f83 ( good ) and 764c1bd5ea856dd7db5171783e442321dc3e041f (bad) segfault when opening sql file.
unfortunately I can not post sql file here ( prod code )
I am not familiar enough with tree-sitter, whats the easiest way to run parser though any sql file ?
I've tried adding offending sql file to
test/highlight
but that did't do anything