codeschool / sqlite-parser

JavaScript implentation of SQLite 3 query parser
MIT License
331 stars 57 forks source link

Feature Request: Add comment nodes to AST #30

Open jdrew1303 opened 8 years ago

jdrew1303 commented 8 years ago

Adding in nodes to handle comments is needed to support other tooling that uses the parser to apply transformations to the entire codebase.

If we were to apply a refactoring using the following pipeline parse current codebase -> kebab-case all table and column names -> generate code from AST then the user would lose all comments that they had on their code.

It would also allow documentation generators to leverage the parser.

(If the comment nodes were to be based off of the ES6 ast, allowing it to use something like Estraverse, then that would be perfect 😉 👍 )

nwronski commented 8 years ago

I thought of a problem if we wanted to add comments to the AST:

/* comments outside of a statement might work alright */
SELECT pants -- But what about 
FROM -- these kinds of comments
  laundry -- what would the AST look like?
WHERE
  clean = true;

Any ideas? Could we put these in the AST in a way that they wouldn't be disruptive and could also be used to reconstruct the original query with the comments in the right places?

jdrew1303 commented 8 years ago

I've been thinking about this for a little while now and looking around to see what others are doing. I think esprima has a reasonably good way of handling this. They tie the comment to the nearest node as either a leadingComment (before the node) or trailingComment (after the node). They are handled as follows:

// Life, Universe, and Everything
var answer /* this is still legal*/ = 6 * 7; // this is trailing
{
    "type": "Program",
    "body": [
        {
            "type": "VariableDeclaration",
            "declarations": [
                {
                    "type": "VariableDeclarator",
                    "id": {
                        "type": "Identifier",
                        "name": "answer",
                        "trailingComments": [
                            {
                                "type": "Block",
                                "value": " this is still legal",
                                "range": [
                                    45,
                                    69
                                ]
                            }
                        ]
                    },
                    "init": {
                        "type": "BinaryExpression",
                        "operator": "*",
                        "left": {
                            "type": "Literal",
                            "value": 6,
                            "raw": "6"
                        },
                        "right": {
                            "type": "Literal",
                            "value": 7,
                            "raw": "7"
                        },
                        "leadingComments": [
                            {
                                "type": "Block",
                                "value": " this is still legal",
                                "range": [
                                    45,
                                    69
                                ]
                            }
                        ]
                    }
                }
            ],
            "kind": "var",
            "leadingComments": [
                {
                    "type": "Line",
                    "value": " Life, Universe, and Everything",
                    "range": [
                        0,
                        33
                    ]
                }
            ],
            "trailingComments": [
                {
                    "type": "Line",
                    "value": " this is trailing",
                    "range": [
                        79,
                        98
                    ]
                }
            ]
        }
    ],
    "sourceType": "script"
}

The area where it gets a bit weird is inside a function declaration with no args:

function noArg(/* I have no node to attach to*/){};

This creates an innerComment like so:

{
    "type": "Program",
    "body": [
        {
            "type": "FunctionDeclaration",
            "id": {
                "type": "Identifier",
                "name": "noArg"
            },
            "params": [],
            "body": {
                "type": "BlockStatement",
                "body": [],
                "innerComments": [
                    {
                        "type": "Block",
                        "value": " I have no node to attach to",
                        "range": [
                            15,
                            47
                        ]
                    }
                ]
            },
            "generator": false,
            "expression": false
        },
        {
            "type": "EmptyStatement"
        }
    ],
    "sourceType": "script"
}

I'm not sure if this would be needed (or worth the effort initially)? It would still be valid to do this in SQL though. What do you think?

nwronski commented 8 years ago

I want to do this, but it would take huge changes to the entire grammar for the parser. I started experimenting with how this might be done, and came up with something that worked for comments that are between statements, but to get all the inline comments would take grammar changes every time the o rule is used in the grammar, which is ~230 times currently.