aglo / MySQLParser

A Parser for MySQL's SQL.
3 stars 6 forks source link

JSQLParser VS Mysql.g(JAVACC & ANTLR) #10

Open wenzong opened 11 years ago

wenzong commented 11 years ago

As mentioned in exist_tools.md, JSqlParser and ZQL are using JAVACC for parse sql statements, while mysql.g is also a tool for parse mysql sql statements(Using ANTLR).

After trying to add mysql supported feature into JSqlParser, I found that JSQLParser is not so clear for changing grammar tree.

Let's take a look at the two tools.


1. Mysql Select Syntax (dev doc)

SELECT
    [ALL | DISTINCT | DISTINCTROW ]
      [HIGH_PRIORITY]
      [STRAIGHT_JOIN]
      [SQL_SMALL_RESULT] [SQL_BIG_RESULT] [SQL_BUFFER_RESULT]
      [SQL_CACHE | SQL_NO_CACHE] [SQL_CALC_FOUND_ROWS]
    select_expr [, select_expr ...]
    [FROM table_references
    [WHERE where_condition]
    [GROUP BY {col_name | expr | position}
      [ASC | DESC], ... [WITH ROLLUP]]
    [HAVING where_condition]
    [ORDER BY {col_name | expr | position}
      [ASC | DESC], ...]
    [LIMIT {[offset,] row_count | row_count OFFSET offset}]
    [PROCEDURE procedure_name(argument_list)]
    [INTO OUTFILE 'file_name'
        [CHARACTER SET charset_name]
        export_options
      | INTO DUMPFILE 'file_name'
      | INTO var_name [, var_name]]
    [FOR UPDATE | LOCK IN SHARE MODE]]

2. JSQLParser

PlainSelect PlainSelect():
{ 
    PlainSelect plainSelect = new PlainSelect(); 
    List<SelectItem> selectItems = null;
    FromItem fromItem = null;
    List<Join> joins = null;
    List<SelectItem> distinctOn = null;
    Expression where = null;
    List<OrderByElement> orderByElements;
    List<Expression> groupByColumnReferences = null;
    Expression having = null;
    Limit limit = null;
    Top top = null;
}
{
    <K_SELECT> 

        [ 
        <K_ALL> 
        | 
        (
         <K_DISTINCT> 
         { Distinct distinct = new Distinct(); 
         plainSelect.setDistinct(distinct); } 

         [ "ON" "(" distinctOn=SelectItemsList()  
         { plainSelect.getDistinct().setOnSelectItems(distinctOn); } ")" ]
        )
        ] 

        [top = Top() { plainSelect.setTop(top); } ]

        selectItems=SelectItemsList()

        // TODO
        [IntoClause()]
        [ <K_FROM>
        fromItem=FromItem()
        joins=JoinsList() ]

        [ where=WhereClause() { plainSelect.setWhere(where); }]
        [ groupByColumnReferences=GroupByColumnReferences() { 
            plainSelect.setGroupByColumnReferences(groupByColumnReferences);}]
            [ having=Having() { plainSelect.setHaving(having); }]
            [LOOKAHEAD(2) orderByElements = OrderByElements()   {        
                plainSelect.setOrderByElements(orderByElements);    }   ]
                [LOOKAHEAD(2) limit = Limit() { plainSelect.setLimit(limit);}]

                { 
                    plainSelect.setSelectItems(selectItems);
                    plainSelect.setFromItem(fromItem);
                    if (joins != null && joins.size() > 0)
                        plainSelect.setJoins(joins);
                    return plainSelect; 
                }
}

3. mysql.g

select_expression:
    SELECT 

    ( ALL | DISTINCT | DISTINCTROW )? 
    (HIGH_PRIORITY)?
    (STRAIGHT_JOIN)?
    (SQL_SMALL_RESULT)? (SQL_BIG_RESULT)? (SQL_BUFFER_RESULT)?
    (SQL_CACHE_SYM | SQL_NO_CACHE_SYM)? (SQL_CALC_FOUND_ROWS)?

    select_list

    ( 
        FROM table_references 
        ( partition_clause )?
        ( where_clause )? 
        ( groupby_clause )?
        ( having_clause )?
    ) ?

    ( orderby_clause )?
    ( limit_clause )?
    ( ( FOR_SYM UPDATE) | (LOCK IN_SYM SHARE_SYM MODE_SYM) )? 
;

4. Style

The mysql.g grammar is more clear for mysql sql statement.

To be honest, the mysql.g(ANTLR) is only a grammar file and it can't work directly. And the JSqlParser(JAVACC) hava many java code like:

Distinct distinct = new Distinct(); 
plainSelect.setDistinct(distinct);

Sometimes, you may spend many time for distinguish between grammar code from java code.

Reg Expr is almost the same

As @wumpz point out in the comment, the simple grammar in JAVACC is also very clear.


PlainSelect PlainSelect():{}
{
    <K_SELECT> 

    [ 
        <K_ALL> 
        | 
            (
                <K_DISTINCT>  
                    [ "ON" "(" SelectItemsList() ")" ]
            )
    ] 

    [Top()]

    SelectItemsList()

     // TODO
    [IntoClause()]
    [ <K_FROM>
      FromItem()
      JoinsList() ]

    [ WhereClause() ]
    [ GroupByColumnReferences() ]
    [ Having() ]
    [LOOKAHEAD(2) OrderByElements()    ]
    [LOOKAHEAD(2) Limit()  ]

}

5. For Developer

JSQLParser is a ripe project

wumpz commented 11 years ago

I don't know anlr very much, but as I understand it, the example does result in an ast tree. But what you posted for JavaCC is not only the grammar but included additional parsing actions to collect the results of the parse process in a specific way, not a syntax tree. The simple grammar with all "code injections" removed is like:

void PlainSelect(): { } {

``` [ | [ "ON" "(" SelectItemsList() ")" ] ] [Top() ] SelectItemsList() [ IntoClause() ] [ FromItem() JoinsList() ] [ WhereClause() ] [ GroupByColumnReferences() ] [ Having() ] [ OrderByElements() ] [ Limit() ] ``` } As you can see this one is much smaller.
wenzong commented 11 years ago

@wumpz Thanks for your advice and sorry for cause the misunderstanding to you. What you said is exactly what I mean by saying this:

To be honest, the mysql.g(ANTLR) is only a grammar file and it can't work directly. And the JSqlParser(JAVACC) hava many java code like:

Distinct distinct = new Distinct(); 
plainSelect.setDistinct(distinct);

Sometimes, you may spend many time for distinguish between grammar and java code

And I find that I give this issue a wrong name for what I really comparing is the JsqlParser 's jj file with mysql.g file. I will correct this wrong logic.

What I really mean is that newbies like me would be confused by the mix of grammar and parsing code.

And I think keep a copy of grammar file of JAVACC without "code injections" is a good idea. I will try it. :)