demydd / pandoc

Automatically exported from code.google.com/p/pandoc
0 stars 0 forks source link

Auto-escaping of \ and ~ causes excess spacing in latex (and therefore pdf) #75

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
1. put an abbreviation in your document: Mr. ex. etc.
2. run markdown2pdf or pandoc latex against document

What is the expected output? What do you see instead?
Normal inter-word spacing "Mr. Smith", "ex. 1." Instead you get Latex's
inter-sentence spacing "Mr.  Smith", "ex.  1." 

What version of the product are you using? On what operating system?
0.47 SVN on Ubuntu 8.04

Please provide any additional information below.

Normally latex allows you to indicate that you don't want the
inter-sentence spacing in these circumstances by either escaping the space
with a \ "Mr.\ Smith" or by inserting a non-breaking space with ~
"Mr.~Smith." The ~ is frequently the preferred method because it prevents
line breaks between a title and name, or in the middle of an abbreviation,
like a citation. 

Following markdown, pandoc escapes these characters in processing, either
by stripping them, in the case of a single "Mr.\ Smith" or going into math
mode to render the ~ in "Mr.~Smith."

None of the escape character sequences I have tried have managed to avoid
this issue. Those include: .\ foo, .\\ foo, \. foo, .\~ foo, .\\~foo

Original issue reported on code.google.com by ianjsull...@gmail.com on 30 Jun 2008 at 6:53

GoogleCodeExporter commented 8 years ago
Thanks for the bug report.  The solution isn't obvious, since pandoc has to 
support
multiple input and output formats.  One thought is to parse '\ ' in markdown as 
a
unicode nonbreaking space character, and change the LaTeX writer so it prints 
this
character as '\ '.  This way "Mr.\ Smith" would come out in LaTeX as "Mr.\ 
Smith"
and in HTML as "Mr. Smith", which would make sense.  But I will think about it
some more.  Suggestions welcome.

Original comment by fiddloso...@gmail.com on 9 Jul 2008 at 4:25

GoogleCodeExporter commented 8 years ago
My only objection to that method would be that it makes the markdown source less
humane.  You shouldn't have to use an escape to be able to write "Mr. Smith"
normally.  The LaTeX writer, ideally, should have a way of determining the 
correct
spacing (from a dictionary of standard abbreviations, with a local dictionary
available per installation and per user, perhaps) and inserting whatever LaTeX 
will
need to behave correctly--the Markdown probably shouldn't be tainted by the 
LaTeX
idiosyncrasy. 

It would still be useful to be able to specify inter-sentence or non-breaking 
spaces
from the Markdown, though, for corner cases.  

Original comment by deeay...@gmail.com on 9 Jul 2008 at 6:10

GoogleCodeExporter commented 8 years ago
I agree, it would be in the spirit of markdown to recognize these cases
automatically as far as possible.  One difficulty, though, is that abbreviations
are language-specific.  "Mr." works for English, but not for Spanish or German.
Though I suppose pandoc's smart-typography feature is already English-centric.

Original comment by fiddloso...@gmail.com on 9 Jul 2008 at 6:23

GoogleCodeExporter commented 8 years ago
Further note: It probably makes sense to add this to the smart typography parser
(which is enabled automatically when output is LaTeX).  The parser would look 
for
"Mr. " "Mrs. " "e.g. " and so on, and convert the final space into a unicode
nonbreaking space.  The LaTeX writer could then escape this as "\ ".

But this still wouldn't provide a way to manually handle cases that aren't
handled automatically.  Not sure how to do that.

Original comment by fiddloso...@gmail.com on 11 Jul 2008 at 12:25

GoogleCodeExporter commented 8 years ago
Suggestion:  '\ ' in markdown produces a nonbreaking space, always.
That solves the second problem in comment 4, above.

Original comment by fiddloso...@gmail.com on 11 Jul 2008 at 12:33

GoogleCodeExporter commented 8 years ago
Fixed, I think, in r1298.  It may still need fine-tuning, but try it.
Many abbreviations are recognized automatically.  "Mr. Brown", "vol. 1", etc.
But you can also force a nonbreaking space using '\ ' in markdown.

Original comment by fiddloso...@gmail.com on 11 Jul 2008 at 2:16

GoogleCodeExporter commented 8 years ago
Thanks, that's great. Would it be difficult for users to add to that list of
abbreviations locally?  We use it at my office for a lot of legal writing and 
there
are a whole slew of citations/abbreviations that we need to escape. 

By the way, currently we're exporting to odt to work around this. That exporter 
is great.

Original comment by ianjsull...@gmail.com on 11 Jul 2008 at 9:41

GoogleCodeExporter commented 8 years ago
That's an interesting idea.  I may implement that later.  For now
I'll leave this issue open so I don't forget about it.

Original comment by fiddloso...@gmail.com on 12 Jul 2008 at 12:38

GoogleCodeExporter commented 8 years ago
Another idea might be to change the default behavior of the markdown2pdf script 
to
use the LaTeX frenchspacing option, which simply gets rid of the special
intra-sentence spacing. Anyone that wants the special LaTeX spacing, or other
advanced features, can always convert directly to LaTeX output and use the 
standard
tools there, but people who just want a simple PDF won't have to pay any 
attention to
the abbreviation issue. Also it is a fix that would work for all languages, 
most of
which never use special spacing between sentences.

Original comment by ianjsull...@gmail.com on 23 Jul 2008 at 4:00

GoogleCodeExporter commented 8 years ago
Yes, that's a thought.  Note that you can just put '\frenchspacing' in the 
pandoc
file -- markdown will parse it as raw latex, and it won't appear in non-tex 
output
formats, like HTML.  Or, even better, put it in a custom header.

I'm reluctant to make it the default, though.  And even with frenchspacing, 
there's
a reason to treat abbreviations specially:  you don't want a line break in the
space between "Mr." and "Brown".

Original comment by fiddloso...@gmail.com on 24 Jul 2008 at 3:34

GoogleCodeExporter commented 8 years ago
ianjsullivan - I'm still thinking about the possibility of reading an 'abbrev' 
files.
 But if there are abbreviations you use a lot, why don't you just send me a list (or
post it here)?  I'd like to make pandoc's default abbreviation list more 
complete. 
However, I'm reluctant to add abbreviations that might naturally occur at the 
end of
a sentence:  e.g., "I met with my Prof."

Original comment by fiddloso...@gmail.com on 12 Sep 2008 at 10:39

GoogleCodeExporter commented 8 years ago
fiddlosopher, thanks for the query. Unfortunately, we use a lot of legal
abbreviations here, which are both plentiful and probably not useful to most 
people. 

Original comment by ianjsull...@gmail.com on 18 Sep 2008 at 8:16