elixir-makeup / makeup_html

HTML lexer for Makeup
MIT License
14 stars 5 forks source link

`HTMLLexer.element_stringify/4` assumes `HTMLLexer.merge_string/2` returns token rather than list of tokens and puts it in list. #2

Closed lkarthee closed 2 years ago

lkarthee commented 2 years ago

HTMLLexer.merge_string/2 takes 2 as parameters which are lists and returns list

@spec merge_string([token()], [token()]) :: [token()]
defp merge_string(tokens,  result)

https://github.com/elixir-makeup/makeup_html/blob/09322079741a5858c0ce7bcad00a43cb8ac14939/lib/makeup/lexers/html_lexer.ex#L158

HTMLLexer.element_stringify/4 assumes HTMLLexer.merge_string/2 returns token rather than list of tokens and puts it in list.

defp element_stringify(
         [{:punctuation, _, ">"} = punctuation | tokens],
         _,
         queue,
         result
       ),
       do: element_stringify(tokens, true, [], result ++ [merge_string(queue), punctuation])
# in above statement result should be =>  result ++ merge_string(queue) ++ [punctuation])

  # We respect the comments
  defp element_stringify(
         [{:comment, _, _} = comment | tokens],
         _,
         queue,
         result
       ),
       do: element_stringify(tokens, true, [], result ++ [merge_string(queue), comment])
# in above statement result should be =>   result ++ merge_string(queue) ++ [comment])

  defp element_stringify(
         [{:punctuation, _, "</"} = punctuation | tokens],
         true,
         queue,
         result
       ),
       do: element_stringify(tokens, false, [], result ++ [merge_string(queue), punctuation])
# in above statement result should be =>   result ++ merge_string(queue) ++ [punctuation])
  defp element_stringify(
         [{:punctuation, _, "<"} = punctuation | tokens],
         true,
         queue,
         result
       ),
       do: element_stringify(tokens, false, [], result ++ [merge_string(queue), punctuation])
# in above statement result should be =>   result ++ merge_string(queue) ++ [punctuation])

For example, consider following call to lex

Makeup.Lexers.HTMLLexer.lex("<h1 alt=\"blah\">This is a heading</h1>")
defmodule Makeup.Lexers.HTMLLexer do
... truncated
  # Finally, the public API for the lexer
  @impl Makeup.Lexer
  def lex(text, opts \\ []) do
    group_prefix = Keyword.get(opts, :group_prefix, random_prefix(10))
    {:ok, tokens, "", _, _, _} = root(text)

    tokens
    |> postprocess()
    |> match_groups(group_prefix)
  end
end

HTMLLexer.postprocess/1 returns some lists to be mixed with tokens after HTMLLexer.element_stringify/4 is called.

[
  {:punctuation, %{language: :html}, "<"},
  {:keyword, %{language: :html}, "h1"},
  {:whitespace, %{language: :html}, " "},
  {:name_attribute, %{language: :html}, "alt"},
  {:operator, %{language: :html}, "="},
  {:string, %{language: :html}, [34, "b", 108, "a", 104, 34]},
  [],
  {:punctuation, %{language: :html}, ">"},
  [
    {:string, %{language: :html},
     [84, 104, "is", " ", "is", " ", "a", " ", "head", "i", 110, 103]}
  ],
  {:punctuation, %{language: :html}, "</"},
  {:keyword, %{language: :html}, "h1"},
  [],
  {:punctuation, %{language: :html}, ">"}
]

this is cleaned up by HTMLLexer.match_groups(group_prefix) .

[
  {:punctuation, %{group_id: "3249811804-1", language: :html}, "<"},
  {:keyword, %{language: :html}, "h1"},
  {:whitespace, %{language: :html}, " "},
  {:name_attribute, %{language: :html}, "alt"},
  {:operator, %{language: :html}, "="},
  {:string, %{language: :html}, [34, "b", 108, "a", 104, 34]},
  {:punctuation, %{group_id: "3249811804-1", language: :html}, ">"},
  {:string, %{language: :html},
   [84, 104, "is", " ", "is", " ", "a", " ", "head", "i", 110, 103]},
  {:punctuation, %{group_id: "3249811804-2", language: :html}, "</"},
  {:keyword, %{language: :html}, "h1"},
  {:punctuation, %{group_id: "3249811804-2", language: :html}, ">"}
]

If I want to use theHTMLLexer without match_groups() (reusing its code in a custom lever) it becomes a problem. I am assuming it is a bug. Can you confirm?

javiergarea commented 2 years ago

Thank you for reporting the issue! #4 should fix it.