camdencheek / tree-sitter-dockerfile

A tree-sitter grammar for Dockerfile
MIT License
71 stars 20 forks source link

Parsing success is dependent of rule ordering #18

Closed mjambon closed 2 years ago

mjambon commented 2 years ago

Correct output with the current grammar:

$ tree-sitter generate && tree-sitter parse <(echo 'from a as b')
(source_file [0, 0] - [1, 0]
  (from_instruction [0, 0] - [0, 11]
    (image_spec [0, 5] - [0, 6]
      name: (image_name [0, 5] - [0, 6]))
    as: (image_alias [0, 10] - [0, 11])))

Incorrect output if we place the image_name rule before the from_instruction rule:

$ tree-sitter generate && tree-sitter parse <(echo 'from a as b')
(source_file [0, 0] - [1, 0]
  (from_instruction [0, 0] - [0, 11]
    (image_spec [0, 5] - [0, 11]
      name: (image_name [0, 5] - [0, 11]))))

This is due to fallback rule 5 in case of conflicting tokens which specifies:

  1. Rule Order - If none of the above criteria can be used to select one token over another, Tree-sitter will prefer the token that appears earlier in the grammar.

In this case, the conflict is between /[^@:\s\$]+/ and /[aA][sS]/. We want to identify AS as a keyword following the image name rather than as a fragment of the image name.

Unfortunately, we run into this issue because ocaml-tree-sitter transforms a grammar in a few ways, which results in some anonymous rules being factored out and given a name. This changes the relative placement of the patterns that match tokens (e.g. /[^@:\s\$]+/ and /[aA][sS]/), causing incorrect parsing as shown above.

I'm not sure what the best solution should be. I'm looking into it.

camdencheek commented 2 years ago

I sounds like this was fixed by your PR #20, but let me know if there are any other open questions