Himujjal / tree-sitter-svelte

Tree sitter grammar for Svelte
MIT License
81 stars 14 forks source link

failed to parse certain components #11

Closed elianiva closed 3 years ago

elianiva commented 3 years ago

I noticed that certain component doesn't get parsed correctly, see this scenario for better explanation.

<Foo>
  <span>bar</span>
</Foo>

<Link>
  <span>bar</span>
</Link>

The first element is parsed correctly, it looks like this:

(document [0, 0] - [3, 0]
  (element [0, 0] - [2, 6]
    (start_tag [0, 0] - [0, 5]
      (tag_name [0, 1] - [0, 4]))
    (text [0, 5] - [1, 2])
    (element [1, 2] - [1, 18]
      (start_tag [1, 2] - [1, 8]
        (tag_name [1, 3] - [1, 7]))
      (text [1, 8] - [1, 11])
      (end_tag [1, 11] - [1, 18]
        (tag_name [1, 13] - [1, 17])))
    (text [1, 18] - [2, 0])
    (end_tag [2, 0] - [2, 6]
      (tag_name [2, 2] - [2, 5])))
  (text [2, 6] - [3, 0]))

but the second element gets parsed as this:

(document [0, 0] - [3, 0]
  (element [0, 0] - [1, 2]
    (start_tag [0, 0] - [0, 6]
      (tag_name [0, 1] - [0, 5])))
  (element [1, 2] - [1, 18]
    (start_tag [1, 2] - [1, 8]
      (tag_name [1, 3] - [1, 7]))
    (text [1, 8] - [1, 11])
    (end_tag [1, 11] - [1, 18]
      (tag_name [1, 13] - [1, 17])))
  (text [1, 18] - [2, 0])
  (ERROR [2, 0] - [2, 7]
    (ERROR [2, 2] - [2, 6]))
  (text [2, 7] - [3, 0]))
test.svelte 0 ms    (ERROR [2, 0] - [2, 7])
Himujjal commented 3 years ago

Will look into it later. Looks like something from C side of the parser. That part is a bit complicated, and I am busy with tree-sitter-zig for now. But sure, I will fix this within this week maybe.

Himujjal commented 3 years ago

Here's the thing. In normal HTML DIV and div are basically the same. That also applies to the native LINK and link. In the case of the custom parser, the parser was considering Link, Input etc which are self-closing tags as link and input respectively. It saw that the you have Link has child elements and thus spurted out errors. That being said, in HTML, SCRIPT and script are allowed. This is even the case with Svelte (go, try!).

I fixed this issue. But I swear, HTML is one of the most difficult formats to parse!

BTW. What to do with your PR?

elianiva commented 3 years ago

ah, I see. I'll just going to rename my component then, I can see that HTML parser is a PITA :laughing:

Thanks!

Himujjal commented 3 years ago

Actually you don't have to rename your component. Link will work just fine. I removed the uppercase/lowercase independence, so people who would write SCRIPT instead of script would have to rename the tag to lowercase.

I was writing a HTML/Svelte parser in Zig recently because tree-sitter is a heavy dependency (2 MB) and yes, its a PITA