Pash-Project / Pash

An Open Source reimplementation of Windows PowerShell, for Mono.
https://groups.google.com/group/pash-project
BSD 3-Clause "New" or "Revised" License
514 stars 54 forks source link

Optional Newline Issue: Current state and possible approaches? #244

Closed sburnicki closed 9 years ago

sburnicki commented 10 years ago

Hi,

what really bothers me currently is the issue with optional newlines. For example in functions the param() block isn't recognized in the following example:

function foo {
param($a, $b)
...
}

Instead, it would need to be directly after the opening brace. I know that we are getting shift/reduce errors from Irony when we insert the newlines, but I don't know the deatils. As I already found posts in the Irony forum from Jay: Jay, can you give a quick summary about what you found out about the concrete issue with newlines, line ends, etc? Maybe it's easier for me to understand the details of the problem then.

And are there any ideas or approaches, yet?

sburnicki commented 9 years ago

@JayBazuzi, do you have some info for me on this? I think you tried to tackle this once? I would really like to solve this newline related stuff.

JayBazuzi commented 9 years ago

I think the grammar as described in the comments is broken:

        ////        function_statement:
        ////            function   new_lines_opt   function_name   function_parameter_declaration_opt   {   script_block   }
        ////            filter   new_lines_opt   function_name   function_parameter_declaration_opt   {   script_block   }

Note that it doesn't say you can include a newline after the open brace. Compare to:

        ////        script_block_expression:
        ////            {   new_lines_opt   script_block   new_lines_opt   }

But since we got rid of significant newlines in the grammar (see commit#6d1be66a), that shouldn't matter. So I'm still confused.

sburnicki commented 9 years ago

Theoretically it can include the newline, because of the definition of param_block which is used first in script_block:

            script_block.Rule =
                param_block_opt + statement_terminators_opt + script_block_body_opt;

            ////        param_block:
            ////            new_lines_opt   attribute_list_opt   new_lines_opt   param   new_lines_opt
            ////                    (   parameter_list_opt   new_lines_opt   )

I think the problem is that param_block is optional and followed by a statement_terminators_opt. Since the SkippedToken rule handles unexpected newlines and checks if there is a way to consume them, it works: The parser can consume the newline by ignoring to consume a param_block and proceeding to consume a statement_terminators_opt instead. Therefore the newline is not skipped, but makes the param block being parsed as the first statement in the script_body_opt instead.

Was the removal of new_line_opt a choice of style (i.e. avoiding so many new_lines_opts, or did it make trouble with the parser (some kind of lookahead errors)?

JayBazuzi commented 9 years ago

Hey, I've been thinking about this, but I don't really know what we should do about it.

sburnicki commented 9 years ago

Okay, I think we basically have three options:

At the moment I would favor option 1, the workaround. But if there are more exceptions like this that we have to manually add as workarounds, I would like to at least try option 3.

JayBazuzi commented 9 years ago

I think each of these options is worth exploring.

I am glad we're using a parser library because I think it lets us do a little less work, but I suspect that the original PowerShell parser is hand-written. I believe the grammar in the language spec was written after the fact, and never used to generate a parser before Pash. This all guesswork on my part.

So I'm not surprised that we're running in to these difficulties, and we should feel free to change strategies as needed to get the desired outcome.

sburnicki commented 9 years ago

Alright. Since I'm not aware of any similar issues right now, I'll close this.