ProtonMail / libsieve-php

libsieve-php is a library to manage and modify sieve (RFC5228) scripts. It contains a parser for the sieve language (including extensions) and a client for the managesieve protocol. It is written entirely in PHP 8+.
https://packagist.org/packages/protonlabs/libsieve-php
GNU General Public License v3.0
21 stars 7 forks source link

Comments not getting placed as children of Root Node #26

Closed Sn0wCrack closed 4 years ago

Sn0wCrack commented 4 years ago

It appears that comments on entirely separate lines are being added as children of whatever the previous Identifier token is.

Here's an example file I'm currently working with:

require ["reject", "replace", "fileinto", "envelope", "copy", "forward", "vacation", "tagsubject", "body", "relational", "axidate", "addsbook", "restoremessage", "markmessage", "feature", "iConfirmation"];

##Filter id=1 name="sales12@axigen.d4rk.com.au" enabled=1
        forward  "sales@axigen.d4rk.com.au";
        stop;

##Filter id=2 name="salesother22@axigen.d4rk.com.au" enabled=1
        forward  "salesother@axigen.d4rk.com.au";
        stop;

##Filter id=3 name="salesnew32@axigen.d4rk.com.au" enabled=1
        forward  "salesnew@axigen.d4rk.com.au";
        stop;

This ends up getting parsed as such:

tree
 |--- <require> type:identifier line:1 (id:1)
               --- trimmed ---
 |  |  `--- <\n\n> type:whitespace line:1 (id:52)
 |  `--- <##Filter id=1 name="sales12@axigen.d4rk.com.au" enabled=1\n> type:comment line:3 (id:53)
 |     `--- <        > type:whitespace line:4 (id:54)
 |--- <forward> type:identifier line:4 (id:55)
 |  |--- <  > type:whitespace line:4 (id:56)
 |  |--- <"sales@axigen.d4rk.com.au"> type:quoted string line:4 (id:57)
 |  `--- <;> type:semicolon line:4 (id:58)
 |     `--- <\n        > type:whitespace line:4 (id:59)
 |--- <stop> type:identifier line:5 (id:60)
 |  |--- <;> type:semicolon line:5 (id:61)
 |  |  `--- <\n\n> type:whitespace line:5 (id:62)
          --- trimmed ---

I'm just wondering if this is intentional that they're not children of the Root Node and instead they're children of the previous Identifier Node.

Another thing I've noticed as well is that newlines at the end of comments isn't parsed as whitespace unlike other tokens where the newlines are interpreted as whitespace when at the end, I'm assuming this is because the whole line ends up getting interpreted as the 'text' of the node and no further parsing occurs on the line.

ThHareau commented 4 years ago

Yes, comments are attached to the last identifier and not to the last block. The reason is that comments can be inserted inside commands. For example: this is a valid filter:

if
    header # test 
    "x-priority" #header name
    "3" # value
{
    stop;
}

In that case you wouldn't want the comment to be attached to the last block, because it would change the location of the comments.

For this reason we have to attach them to the last identifier (# test is attached to header, #header name to x-priority, ...).

ThHareau commented 4 years ago

And yes, single line comments begins with a # and ends with a new line. As new line in that case has a meaning, they are included in the comment and not interpreted as part of a whitespace.

Sn0wCrack commented 4 years ago

Thanks for the clarification on both points, @ThHareau

That definitely makes sense for why that's happening then, especially with your provided example.

However in my example the comments exist solely on separate lines not attached to anything else, but are getting parsed like they belong to the previous identifier as if they existed on the same line, I'm assuming then this is still intended behavior?

ThHareau commented 4 years ago

Yes it is. As define here, comments are parsed exactly as if they were whitespaces.

It's indeed weird to have the comments on new lines being child of the previous command. However, moving them to the root node would need us to have different conditions for different type of comments.

That would make the library more complex (and less performant), and as comments are meant to be ignored, it was decided to choose the simplest implementation.

Sn0wCrack commented 4 years ago

Alright, no issues then, thanks for clearing that up, I'll close this issue now as it's intended then.