demydd / pandoc

Automatically exported from code.google.com/p/pandoc
0 stars 0 forks source link

Support for strikeout (patch included) #18

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
I've added support for strikeout text.  The included patch applies against
your current trunk (r714 at time of submission).  The only input/output
format that didn't support strikeout was reStructured text, so I added a
dummy output for it.  For the rest, I added the standard methods for the
given format.

Original issue reported on code.google.com by Bradley.Sif@gmail.com on 15 Jul 2007 at 4:28

Attachments:

GoogleCodeExporter commented 8 years ago
I forgot to mention.  The patch attached above was produced by me as 
work-for-hire by
me for the Software Freedom Law Center.  By my authority as CTO of the SFLC, 
and with
further approval of our Director, I disclaim any copyright interest in the 
patch that
I or the Software Freedom Law Center hold.

Original comment by Bradley.Sif@gmail.com on 15 Jul 2007 at 4:30

GoogleCodeExporter commented 8 years ago
Many thanks for the patch!  Before adding anything to pandoc, though, I want to 
think
a bit more about what a Strikeout inline element would be used for. I'm 
guessing it's
used mainly for tracking deletions to a document. If that's right, the following
concerns come to mind:

- Additions are as important to track as deletions. So if there's an element to
  represent text as deleted, shouldn't there also be an element to represent text as
  inserted? (HTML has the pair <del> and <ins> for this purpose.)

- The Strikeout inline element would have limitations in tracking deletions.
  It could track deleted inline elements. But (a) it would not be able to track
  deletions in *parts* of inline elements (such as the title field of a link,
  or the text of a Code element, which is represented as a string). And (b) it
  would not be able to track deletions of block elements. (Sure, you could
  surround all the inline elements in the block with ~~, but that would be
  extremely tiresome in, say, a nested list.)  Perhaps (b) could be fixed by
  adding a Strikeout block element, but this would not help with (a).  You
  wouldn't be able to strike out one line in a CodeBlock, for example. The best
  you could do would be to strike out the whole block.

- All of this makes me wonder whether there's a better solution to change
  tracking, one that does not require any changes to pandoc's document structure.
  One idea would be to use a diff-like program to compare the HTML versions of
  two pandoc documents and insert <ins> and <del> tags accordingly. (Contents
  of <del> tags are represented by default as strikeout.)  I wrote a little
  Wiki using pandoc that does this very nicely (using Data.List.LCS.HuntSzymanski)
  on a character-by-character basis (not line-by-line as with diff). It would be 
  a bit more difficult to do this with LaTeX, because of the way verbatim data is
  insulated from the rest, but this would mostly just be a problem with contents of
  code blocks.

It would be useful to hear your thoughts about these concerns, and also to hear
a bit more about how you've been using Strikeout.

Original comment by fiddloso...@gmail.com on 15 Jul 2007 at 3:33

GoogleCodeExporter commented 8 years ago
[ BTW, although it's off-topic for this discussion, I wanted to mention to you 
at
some point how excellent pandoc is!  I would have mentioned it in my first post 
had I
not wanted to stay on-topic when creating the ticket. :) ]

You are quite right that strikeout is often used for change-tracking.  I agree 
with
you that a larger system to help pandoc create change-tracking of documents 
would be
extremely useful, and I'd love to see it implemented (and may even be willing 
to help
with it, as I have a general need for that too -- but we should probably have 
that
discussion on a different forum).

However, I feel that such a feature is a separate issue entirely.  Many document
forms (Docbook, RTF, LaTeX, Many Wiki format engines, HTML with the <s> and
<strikeout> tags (albeit deprecated)) allow the user to put in strikeout as a 
text
markup, just like italics, underlining, and bold.  If pandoc encounters such 
markup
in someone's existing document, it should do the right thing with it, 
(basically)
regardless of other features pandoc may have to help with change-tracking.

To answer your question about what inspired me to add the feature: I was 
originally
drawn to pandoc as a way to easily build S5 slides from Markdown and other
easily-editable formats.  I work with lawyers (www.softwarefreedom.org, if you 
are
interested), and they give lots of presentations, and I'm trying to keep them 
from
using Impress (yuck!).  I'm giving them pandoc with its S5 generation ability 
and
Markdown as a source format as a way to make easy slides.

One of the items they often need in a presentation slide is to show differences 
in
legal text from earlier revisions of the same document.  The example that 
inspired us
to add strikeout were slides that compare the text of GPLv2 and GPLv3.  Now, I 
grant
you that we're showing markup for change tracking.  But, in the case of an
S5-formatted slide generated Markdown, that's not really a fundamental issue 
when
producing that document.  The fundamental issue is that someone took the 
*output* of
some change-tracking system and now they want to display that output in a 
reasonable way.

In summary, my reasons for adding it the way I did are two-fold: many formats 
have a
native way of representing strikeout anyway, and therefore pandoc will 
encounter that
markup in its usual conversion work, and should DTRT when it does.  Second, 
there are
times when the users will just want to represent in a reasonable way some change
markup from another source, and they may not even have the original two 
documents
around to produce proper change markup via this new feature of pandoc you 
mention.

Original comment by Bradley.Sif@gmail.com on 16 Jul 2007 at 5:17

GoogleCodeExporter commented 8 years ago
Thanks for the clarification.  That makes sense, and I plan to incorporate your 
Strikeout changes into pandoc soon.  They will probably make it into the 0.4 
release 
due later this summer.  I will have to change the syntax, though, because I was 
already planning to use tildes for subscripts (as in H~2~O).  I think that a 
double 
tilde would make sense here:

    This text ~~has been deleted~~.

Thanks again for the patch and the comments!

Original comment by fiddloso...@gmail.com on 19 Jul 2007 at 5:45

GoogleCodeExporter commented 8 years ago
Strikeout has been added to pandoc, along with superscripts and subscripts,
as of r778.  Thanks again!

Original comment by fiddloso...@gmail.com on 22 Jul 2007 at 10:05