apache / datafusion-sqlparser-rs

Extensible SQL Lexer and Parser for Rust
Apache License 2.0
2.79k stars 537 forks source link

Detect stack overflow and reduce stack usage on debug build #1465

Open Eason0729 opened 1 month ago

Eason0729 commented 1 month ago

related to https://github.com/apache/datafusion/issues/12731

The issue is that unoptimized(debug) build consume more stack memory, which known to cause stack overflow if stack size is less than 1MiB.

Expected change:

  1. Return error on stack overflow
  2. Reduce stack usage (or grow stack
Eason0729 commented 1 month ago

I would like to take stacker approach, that is:

  1. implement stack guard at some point
  2. if red zone is reached, try grow stack
  3. if unable to grow stack, return an error

If those words aren't enough to explain the approach I would like to take, I can submit a draft PR to make it clear with code.

alamb commented 1 month ago

Related discussions about recursion limits:

https://docs.rs/sqlparser/latest/sqlparser/parser/struct.Parser.html#method.with_recursion_limit

Eason0729 commented 1 month ago

https://github.com/apache/datafusion-sqlparser-rs/pull/1468

Eason0729 commented 3 weeks ago

I spent some time manually run reg read sp fp, and write down rbp-rsp (in debug build.

Detail > export RUSTFLAGS="-C force-frame-pointers=yes" ``` parse_query 12592 parse_boxed_query_body 12592 parse_select 17024 parse_optional_alias 256 parse_comma_separated 1056 parse_select 17024 parse_table_and_joins 10016 parse_derived_table_factor 2384 ``` And here is some large struct(in byte): ``` Expr 296 Statement 3528 ``` ``` let parser = Parser::new(&GenericDialect {}); let sql = r#" SELECT seq FROM ( SELECT seq, LEAD(seq) OVER (ORDER BY seq) AS next_seq FROM parquet_table ) AS subquery WHERE next_seq - seq > 1; "#; let stmt = parser.try_with_sql(sql).unwrap().parse_statement().unwrap(); print!("{}", stmt.to_string()); ```

In this example, sqlparser consume about 140KB stack at peak, so I think there is little to optimize in sqlparser. I would try, but we still need to reduce stack usage on datafusion.

Eason0729 commented 6 days ago

I am busy doing some school works recently (would probably stay busy until the end of this semester), so if there is anyone interested in this issue, feel free to work on it.

Eason0729 commented 1 day ago

I will try to reproduce it with https://github.com/apache/datafusion/pull/13177 next week, chances are that it's related.