github / semantic

Parsing, analyzing, and comparing source code across many languages
8.93k stars 453 forks source link

grammar extras are ignored #717

Open flip111 opened 4 months ago

flip111 commented 4 months ago

https://tree-sitter.github.io/tree-sitter/creating-parsers#the-grammar-dsl

extras - an array of tokens that may appear anywhere in the language. This is often used for whitespace and comments. The default value of extras is to accept whitespace. To control whitespace explicitly, specify extras: $ => [] in your grammar.

input source file

foo
<?php
echo 'bar';
?>
baz
<?php
echo 'qux';
?>
quux

Applicable grammer rule: https://github.com/tree-sitter/tree-sitter-php/blob/41a408d5b996ef54d8b9e1b9a2469fad00c1b52b/src/grammar.json#L6241

output from tree-sitter

(program [0, 0] - [8, 4]
  (text [0, 0] - [1, 0])
  (php_tag [1, 0] - [1, 5])
  (echo_statement [2, 0] - [2, 11]
    (string [2, 5] - [2, 10]
      (string_value [2, 6] - [2, 9])))
  (text_interpolation [3, 0] - [5, 5]
    (text [4, 0] - [5, 0])
    (php_tag [5, 0] - [5, 5]))
  (echo_statement [6, 0] - [6, 11]
    (string [6, 5] - [6, 10]
      (string_value [6, 6] - [6, 9])))
  (text_interpolation [7, 0] - [8, 4]
    (text [8, 0] - [8, 4])))

output from semantic

Right
    ( Term
        { getTerm = Program
            { ann = Loc
                { byteRange = Range
                    { start = 0
                    , end = 505
                    }
                , span = Span
                    { start = Pos
                        { line = 0
                        , column = 0
                        }
                    , end = Pos
                        { line = 26
                        , column = 0
                        }
                    }
                }
            , extraChildren =
                [ R1
                    ( R1
                        ( Text
                            { ann = Loc
                                { byteRange = Range
                                    { start = 0
                                    , end = 4
                                    }
                                , span = Span
                                    { start = Pos
                                        { line = 0
                                        , column = 0
                                        }
                                    , end = Pos
                                        { line = 1
                                        , column = 0
                                        }
                                    }
                                }
                            , text = "foo
                              "
                            }
                        )
                    )
                , R1
                    ( L1
                        ( PhpTag
                            { ann = Loc
                                { byteRange = Range
                                    { start = 4
                                    , end = 9
                                    }
                                , span = Span
                                    { start = Pos
                                        { line = 1
                                        , column = 0
                                        }
                                    , end = Pos
                                        { line = 1
                                        , column = 5
                                        }
                                    }
                                }
                            , text = "<?php"
                            }
                        )
                    )
                , L1
                    ( Statement
                        { getStatement = L1
                            ( R1
                                ( L1
                                    ( L1
                                        ( EchoStatement
                                            { ann = Loc
                                                { byteRange = Range
                                                    { start = 10
                                                    , end = 21
                                                    }
                                                , span = Span
                                                    { start = Pos
                                                        { line = 2
                                                        , column = 0
                                                        }
                                                    , end = Pos
                                                        { line = 2
                                                        , column = 11
                                                        }
                                                    }
                                                }
                                            , extraChildren = L1
                                                ( Expression
                                                    { getExpression = L1
                                                        ( L1
                                                            ( L1
                                                                ( PrimaryExpression
                                                                    { getPrimaryExpression = L1
                                                                        ( L1
                                                                            ( L1
                                                                                ( L1
                                                                                    ( Literal
                                                                                        { getLiteral = R1
                                                                                            ( R1
                                                                                                ( R1
                                                                                                    ( String
                                                                                                        { ann = Loc
                                                                                                            { byteRange = Range
                                                                                                                { start = 15
                                                                                                                , end = 20
                                                                                                                }
                                                                                                            , span = Span
                                                                                                                { start = Pos
                                                                                                                    { line = 2
                                                                                                                    , column = 5
                                                                                                                    }
                                                                                                                , end = Pos
                                                                                                                    { line = 2
                                                                                                                    , column = 10
                                                                                                                    }
                                                                                                                }
                                                                                                            }
                                                                                                        , text = "'bar'"
                                                                                                        }
                                                                                                    )
                                                                                                )
                                                                                            )
                                                                                        }
                                                                                    )
                                                                                )
                                                                            )
                                                                        )
                                                                    }
                                                                )
                                                            )
                                                        )
                                                    }
                                                )
                                            }
                                        )
                                    )
                                )
                            )
                        }
                    )
                , L1
                    ( Statement
                        { getStatement = L1
                            ( R1
                                ( L1
                                    ( L1
                                        ( EchoStatement
                                            { ann = Loc
                                                { byteRange = Range
                                                    { start = 35
                                                    , end = 46
                                                    }
                                                , span = Span
                                                    { start = Pos
                                                        { line = 6
                                                        , column = 0
                                                        }
                                                    , end = Pos
                                                        { line = 6
                                                        , column = 11
                                                        }
                                                    }
                                                }
                                            , extraChildren = L1
                                                ( Expression
                                                    { getExpression = L1
                                                        ( L1
                                                            ( L1
                                                                ( PrimaryExpression
                                                                    { getPrimaryExpression = L1
                                                                        ( L1
                                                                            ( L1
                                                                                ( L1
                                                                                    ( Literal
                                                                                        { getLiteral = R1
                                                                                            ( R1
                                                                                                ( R1
                                                                                                    ( String
                                                                                                        { ann = Loc
                                                                                                            { byteRange = Range
                                                                                                                { start = 40
                                                                                                                , end = 45
                                                                                                                }
                                                                                                            , span = Span
                                                                                                                { start = Pos
                                                                                                                    { line = 6
                                                                                                                    , column = 5
                                                                                                                    }
                                                                                                                , end = Pos
                                                                                                                    { line = 6
                                                                                                                    , column = 10
                                                                                                                    }
                                                                                                                }
                                                                                                            }
                                                                                                        , text = "'qux'"
                                                                                                        }
                                                                                                    )
                                                                                                )
                                                                                            )
                                                                                        }
                                                                                    )
                                                                                )
                                                                            )
                                                                        )
                                                                    }
                                                                )
                                                            )
                                                        )
                                                    }
                                                )
                                            }
                                        )
                                    )
                                )
                            )
                        }
                    )
                ]
            }
        }
    )

Problems:

flip111 commented 4 months ago

Possibly a problem with upstream. I will cross-post there.