RobertDober / earmark_parser

The Markdown to AST part of Earmark.
Apache License 2.0
68 stars 26 forks source link

Can't parse header when nested HTML tags found in previous section #77

Closed kianmeng closed 2 years ago

kianmeng commented 2 years ago

Sample test file: not_ok.md

## S2 was parsed as p tag instead of h2 when we have nested tag in separate line:

<nav>
   <ul></ul>
</nav>
iex> {:ok, content} = File.read("not_ok.md")
{:ok,
 "## S1\n\nfoo\n\n```elixir\niex> MyModule.foobar()\n\"<nav>\n  <ul></ul>\n</nav>\"\n```\n\n## S2\n\nbar\n"}
iex> EarmarkParser.as_ast(content)          
{:ok,
 [
   {"h2", [], ["S1"], %{}},
   {"p", [], ["foo"], %{}},
   {"pre", [],
    [
      {"code", [{"class", "elixir"}],
       ["iex> MyModule.foobar()\n\"<nav>\n  <ul></ul>\n</nav>\""], %{}}
    ], %{}},
   {"p", [], ["## S2"], %{}},
   {"p", [], ["bar"], %{}}
 ], []}

Sample test file: ok.md

## S2 was parsed correctly as h2 when we have nested tag in single line:

<nav><ul></ul></nav>
iex> {:ok, content} = File.read("ok.md")    
{:ok,
 "## S1\n\nfoo\n\n```elixir\niex> MyModule.foobar()\n\"<nav><ul></ul></nav>\"\n```\n\n## S2\n\nbar\n"}
iex> EarmarkParser.as_ast(content)      
{:ok,
 [
   {"h2", [], ["S1"], %{}},
   {"p", [], ["foo"], %{}},
   {"pre", [],
    [
      {"code", [{"class", "elixir"}],
       ["iex> MyModule.foobar()\n\"<nav><ul></ul></nav>\""], %{}}
    ], %{}},
   {"h2", [], ["S2"], %{}},
   {"p", [], ["bar"], %{}}
 ], []}
RobertDober commented 2 years ago

Ty for reporting this I'll have a look ASAP

RobertDober commented 2 years ago

This is a regression from lookahead in scanning code blocks.

RobertDober commented 2 years ago

@josevalim I was wondering if this is a problem for you if I fix this

This seems to stop the lookahead at the closing html tag, but a closing html tag can clearly be part of a code block as @kianmeng reported above.

I suspect you had a good reason to do this, maybe you can help me out here? Thanx in advance.

I guess this spec describes correct behavior

and so does this one

josevalim commented 2 years ago

Thanks for pinging me! I will send a PR!

josevalim commented 2 years ago

Hi @RobertDober, so I added some code last time to not break some test and this code is now back causing failures. In particular, I did not want to break this test:

https://github.com/RobertDober/earmark_parser/blob/master/test/acceptance/ast/html/block/annotated_block_test.exs#L75-L81

But I just tried this corner case in GitHub and it does not give the same result as markdown. Instead, it treats the closing </div> as part of the code.

I am requesting permission to treat this case as an error and raise altogether.

RobertDober commented 2 years ago

@josevalim sorry for the delay I was looking at the PR :blush: Permission granted :wink:

Before merging I would love to have your opinion on the remark I made in the discussion regarding this test IMHO this test should still pass or is it too much trouble?

josevalim commented 2 years ago

@RobertDober oh, it is easy to make it pass. I didn't know the indentation had to match. I will push a fix.

josevalim commented 2 years ago

I can make it pass but note you have a contradictory test here: https://github.com/RobertDober/earmark_parser/blob/master/test/acceptance/ast/list_and_inline_code_test.exs#L66

josevalim commented 2 years ago

@RobertDober I have updated the PR. I reduced the amount of scenarios the lookahead applies to. This means we don't have to change any existing test and the new regression tests pass. :)

RobertDober commented 2 years ago

@josevalim muito obrigado for all your help I'll merge and take a look at the test you pointed out.

josevalim commented 2 years ago

@RobertDober fwiw, I don't think you need to address those tests. They all happen under very dubious markdown and I would say the best solution is to correct the markdown. I just wanted to point out that some tests in the suite do not match GitHub, but they are probably all "against the spec". :) So if in the future you are making changes and you are worried about tests, it may be that some of them can change!

RobertDober commented 2 years ago

@josevalim yes we are talking peanuts here, the real problem is lists and I am making slow progress on that...

Always a pleasure to work with you.

Have a Great Festive Season and stay save.