GreptimeTeam / greptimedb

An open-source, cloud-native, unified time series database for metrics, logs and events with SQL/PromQL supported. Available on GreptimeCloud.
https://greptime.com/
Apache License 2.0
4.36k stars 315 forks source link

Have antlr4 syntax file? #1672

Closed melin closed 8 months ago

melin commented 1 year ago

What problem does the new feature solve?

no

What does the feature do?

Want to use java parsing sql syntax, obtain table information

Implementation challenges

No response

evenyag commented 1 year ago

We use a hand-written parser library sqlparser-rs so we can't provide an antlr4 syntax file. Would you like to get the table name from SQL? Not sure whether a generic SQL parser can do this @fengjiachun

waynexia commented 1 year ago

I'm afraid generic SQL parser cannot handle our extended grammar

melin commented 1 year ago

I think so. The main database sql antlr4 syntax file is provided here https://github.com/antlr/grammars-v4/tree/master/sql

https://github.com/rrevenantt/antlr4rust

melin commented 1 year ago

cratedb: https://github.com/crate/crate/blob/7f4ccfe32b56cac0bbcde5b39db3661462d188f4/libs/sql-parser/src/main/antlr/SqlBaseParser.g4

@evenyag

fengjiachun commented 1 year ago

Are you only parsing the SQL to get the table name? Maybe you can take a look at this.

melin commented 1 year ago

需要解析ddl和dml 等语句信息,不只表名信息,我们是做一个大数据平台,接入各种数据源,平台统一支持校验权限和血缘, 如果有antlr4语法文件, https://github.com/melin/superior-sql-parser 就考虑加入。

fengjiachun commented 1 year ago

Sorry, we don't have antlr4 syntax files

killme2008 commented 1 year ago

Our SQL dialect only modifies the syntax of creating table. I can fork an antlr4 syntax from other projects and make it work for greptimedb. Let me try it when I am free.

melin commented 1 year ago

spark sql is based on the presto antrl syntax file, and doris is based on the spark antlr4 syntax file.

killme2008 commented 1 year ago

spark sql is based on the presto antrl syntax file, and doris is based on the spark antlr4 syntax file.

Cool, I'll try them. Thank you.

tisonkun commented 8 months ago

No.

The most significant challenge is that we don't use such Grammar files in code so the compatibility is no guarantee.

ANTLR doesn't have proper Rust impl.

Two ideas we can follow:

  1. Extract the rust parser so that it can be used as a standalone lib.
  2. Write a g4 file anyway and maintain a parsing test suite. Like postgres g4.

This can be an open-ended discussion and I don't think we'd maintain such a file in the main repo.