demydd / pandoc

Automatically exported from code.google.com/p/pandoc
0 stars 0 forks source link

HTML in <head> is processed, may result in invalid <p> tags #108

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
For instance:

<html>
<head>
<style>em { color: red }</style>
</head>

<body>

Foo *Red* Bar

</body>
</html>

Running that through pandoc -f markdown results in:

<html><p
><head> <style>em { color: red }</style> </head></p
><body>
<p
>Foo <em
  >Red</em
  > Bar</p
></body>
</html>

Note the incorrect <p> tag which surrounds <head>: it seems to be due to
the blank line after </head> in the input. Moving the blank line to after
</style> is even worse, giving:

<html><p
><head> <style>em { color: red }</style></p
></head><body>
<p
>Foo <em
  >Red</em
  > Bar</p
></body>
</html>

Now the <p> starts outside the <head> and ends inside it.

I noticed this when defining link references which, after some processing,
eventually got placed into <head>.

My humble suggestion is that the contents of <head> be scraped only for
definitions of reference-style links and be otherwise left untouched.

pandoc 1.1 +citeproc +highlighting on Windows XP.

Original issue reported on code.google.com by Deewi...@gmail.com on 15 Nov 2008 at 2:25