anammari / pandoc

Automatically exported from code.google.com/p/pandoc
GNU General Public License v2.0
0 stars 0 forks source link

Normalize internal document representation #250

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
The current document models (Text.Pandoc.Definition) allows different instances 
that represent the same text. For instance the following HTML input should all 
be the same:

<i><i>foo</i>bar</i>

<i>foo</i><i>bar</i>

<i>foobbar</i>

Normalization could took place at the internal document representation so you 
get syntax normalization for free. You could start with 'data Inline' in module 
'Text.Pandoc.Definition' and sanitize nested elements etc. For instance 
'Inline' can be 'Link [Inline] Target' so a link text can contain another link 
which is nonsense and most markup languages cannot express anyway.

Original issue reported on code.google.com by siehea...@googlemail.com on 23 Jul 2010 at 10:19

GoogleCodeExporter commented 8 years ago
It would be easy to add normalization between the reader and writer.  I 
experimented a bit with this, though, and I'm worried about the performance 
implications.  I guess it's a tradeoff between performance and the advantages, 
whatever they may be, of normalization. I will experiment some more....

Original comment by fiddloso...@gmail.com on 8 Dec 2010 at 8:00

GoogleCodeExporter commented 8 years ago
A --normalize option has been added. Because of the performance penalty, I'm 
not going to make it the default.

% pandoc --normalize
*hi**there*
<p
><em
  >hithere</em
  ></p
>

Original comment by fiddloso...@gmail.com on 27 Jan 2011 at 6:24