alemuller / tree-sitter-make

MIT License
39 stars 14 forks source link

Statement after a rule's recipe gets appended to the recipe #7

Open guy4261 opened 2 years ago

guy4261 commented 2 years ago

in a Makefile containing a rule with a recipe, followed by anyhing:

target: prerequisite
<TAB>recipe_line_1
<TAB>recipe_line_2
# whatever

From the point in which you start collecting recipe_line lines, the first line that does not start with <TAB> (aka \t) should mark the end the rule's recipe and the beginning of a new node.

Here's a really small repro:

from tree_sitter import Language
from tree_sitter import Parser

Language.build_library(
    # Store the library in the `build` directory
    "build/my-languages.so",
    [
        ##################
        "tree-sitter-make"
        ##################
    ],
)

MAKE_LANGUAGE = Language("build/my-languages.so", "make")
parser = Parser()
parser.set_language(MAKE_LANGUAGE)

# This is a repro, I'll try to minimize it later:
repro = """$(A): $(B) $(C)
\techo egg

# some comment
ifneq (1, 0)
$(D) $(E): $(F)
\techo soup
endif
"""

And now

>>> parser.parse(repro.encode("utf-8")).walk().node.children
[<Node kind=rule, start_point=(0, 0), end_point=(8, 0)>]

I'm putting this here - might solve later 👍🏼

guy4261 commented 2 years ago

To be more precise -

# if this is my makefile's contents
repro = b"""$(A): $(B) $(C)
\techo egg

# some comment
ifneq (1, 0)
$(D) $(E): $(F)
\techo soup
endif
"""

# this gets parsed into a single rule
>>> parsed = parser.parse(repro).walk().node.children
>>> parsed
[<Node kind=rule, start_point=(0, 0), end_point=(8, 0)>]

# the rule has a recipe
>>> rule = parsed[0]
>>> rule.children
[<Node kind=targets, start_point=(0, 0), end_point=(0, 4)>,
 <Node kind=":", start_point=(0, 4), end_point=(0, 5)>,
 <Node kind=prerequisites, start_point=(0, 6), end_point=(0, 15)>,
 <Node kind=recipe, start_point=(0, 15), end_point=(8, 0)>]

# the comment and conditional were appended to the recipe
>>> recipe = rule.children[-1]
>>> recipe.children
[<Node kind=recipe_line, start_point=(1, 1), end_point=(1, 9)>,
 <Node kind=comment, start_point=(3, 0), end_point=(3, 14)>,
 <Node kind=conditional, start_point=(4, 0), end_point=(8, 0)>]

But what I expected to see was this:

>>> parser.parse(repro.encode("utf-8")).walk().node.children
[
    <Node kind=rule, start_point=(0, 0), end_point=(1, 9)>
    <Node kind=comment, start_point=(3, 0), end_point=(3, 14)>,
    <Node kind=conditional, start_point=(4, 0), end_point=(8, 0)>]
]
guy4261 commented 2 years ago

I could go on and on :)

If I split the repro to 3 different statements and treat parse each one separately, the parsing works flawlessly:

repro_rule = b"""$(A): $(B) $(C)
\techo egg
\techo dog"""

repro_comment = b"""# some comment"""

repro_conditional = b"""ifneq ($(DOOM),0)
$(D) $(E): $(F)
\techo soup
endif
"""

print(parser.parse(repro_rule).walk().node.children)
print(parser.parse(repro_comment).walk().node.children)
print(parser.parse(repro_conditional).walk().node.children)

Results in

[<Node kind=rule, start_point=(0, 0), end_point=(2, 9)>]
[<Node kind=comment, start_point=(0, 0), end_point=(0, 14)>]
[<Node kind=conditional, start_point=(0, 0), end_point=(4, 0)>]
guy4261 commented 2 years ago

I couldn't decipher how to solve it - how to mark the end of the recipe (=the last recipe_line node) as the one followed by anything that's not a recipe line (=that does not start with a tab).

Any advice please? 🙇

guy4261 commented 2 years ago

Here goes - from https://www.gnu.org/software/make/manual/html_node/Recipe-Syntax.html :

A conditional expression (ifdef, ifeq, etc. see Syntax of Conditionals) in a “rule context” which is indented by a tab as the first character on the line, will be considered part of a recipe and be passed to the shell.

So like a recipe line, a conditional in a recipe should be prefixed with a <tab>. I'll try adding this fix to the grammar.js.

alemuller commented 2 years ago

Hi, @guy4261. Thanks for the issue (and sorry for taking so long for reply).

Makefile syntax ins't suitable for tree-sitter. The directives (such if-then-else) in make are similar to preprocessor directives in C. To parse it correctly it would need multiple passes, what can't be done with tree-sitter. The tree-sitter-c shares those issues.


The manual defines "rule context" as:

“rule context” (that is, after a rule has been started until another rule or variable definition)

When the conditional directive is in the recipe context is easy to parse:

foo:
ifdef x
    echo
endif

Your case can be summarized as:

foo:
ifdef x
bar:
endif

I remeber trying to fix this while writing the revised grammar (on master instead of main), I don't remeber why I didn't implement. I'll have a look at it again.

However, there are cases like this:

foo:
ifdef x
    echo
bar:
endif

Should this be at top context? Or recipe context? I can't be in both.

I suggest using the version in master, not in main. I haven't changed the default to avoid breaking nvim-tree-sitter.