Smileyt / python-markdown2

Automatically exported from code.google.com/p/python-markdown2
Other
0 stars 0 forks source link

SmartyPants patch #42

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
Hi Trent,

Any interest in auditing or accepting this SmartyPants patch?  

This addition does not support a couple features that were present in the 
original:
* No support for the "old-school" dash syntax mentioned in SmartyPants.pl, that 
is, -- for em dashes and --- for en dashes.
* No support for backtick-delimited quotes.  First, they're ugly and rarely 
used.  And second, it's too easy to invoke the regex on accident if it is 
supposed to match with a closing tag, which can interfere with the more 
important <code> segments.  It's somewhat mitigated by only supported ``'' 
instead of `', as SmartyPants.pl does, but still...too much ambiguity.

The contractions are generally pretty simple (but sometimes) a little smarter 
than SmartyPants.pl's).  The code assumes "text" means anything that's not 
whitespace or the beginning or end of the string.
* If a prime falls between text and whitespace, the quotation mark curls in the 
direction of the text.
* A single prime falling in between text and text becomes an apostrophe, and 
curls to the left.
* A single prime falling in front of tis, twas, Tis, or Twas becomes an 
apostrophe, too.
* Closing quotation marks don't have to fall in front of whitespace; they can 
also fall in front of common punctuation, as they would in British English.

I also tested some code that substituted guillemets for double angle brackets, 
but I removed it because of (a) the fact that the engine escapes angle brackets 
ASAP and (b) extending Markdown/SmartyPants "conventions" without another 
configuration option may not be the best policy.

I did write an input file for test.py, but I haven't been able to manually 
generate an HTML file that is correctly formatted - but it's always close.

Thanks, Nikhil

Original issue reported on code.google.com by nikhil.chelliah on 19 Jun 2010 at 2:43

Attachments:

GoogleCodeExporter commented 8 years ago
Forgot to add a description of the smarty-pants extra in the docs at the top of 
the file.

Original comment by nikhil.chelliah on 21 Jun 2010 at 12:42

Attachments:

GoogleCodeExporter commented 8 years ago
Nikhil,

Just a note that I've been looking at your patch and I'll definitely be 
accepting it. Great stuff. I'd had it on my wish list to have Smarty Pants 
support. It might take me a day or two to get the patch in tho.

Original comment by tre...@gmail.com on 21 Jun 2010 at 6:18

GoogleCodeExporter commented 8 years ago
First part added in r256.

Original comment by tre...@gmail.com on 21 Jun 2010 at 5:54

GoogleCodeExporter commented 8 years ago
Finished in r257. Thanks a lot Nikhil!

This will be in a (coming) 1.0.1.18 release.

Original comment by tre...@gmail.com on 21 Jun 2010 at 6:09

GoogleCodeExporter commented 8 years ago
No problem, and thanks for including the patch.

I've made some changes on top of r257 to let SmartyPants detect the 
contractions: 'tis 'twas 'twer 'oer 'neath 'o 'n 'round 'bout 'twixt 'nuff 
'fraid 'sup

It's true that people might type 'round, 'bout, etc. with ' supposed to be an 
opening scare quote, but that's probably pretty rare.

There are others, such as words without their initial h (as in the Cockney 
accent), but those are extremely archaic and too numerous to keep track of.

Original comment by nikhil.chelliah on 22 Jun 2010 at 8:11

Attachments:

GoogleCodeExporter commented 8 years ago
Here's a better patch - this one will also convert the apostrophe in '65, '09, 
etc.

Hopefully that's the last patch for a while.

Original comment by nikhil.chelliah on 22 Jun 2010 at 9:45

Attachments:

GoogleCodeExporter commented 8 years ago
re-opening this bug so Nikhil's additions don't get lost.

Original comment by tre...@gmail.com on 22 Jun 2010 at 10:19

GoogleCodeExporter commented 8 years ago
Couple more changes - ellipses can now have a space on either side, and only 
numeric entities are used instead of named ones (e.g. ‘ instead of ‘) 
because the latter isn't really supported in XML/XHTML.

Again, the patches include all my changes since r257.

Original comment by nikhil.chelliah on 1 Jul 2010 at 11:52

Attachments:

GoogleCodeExporter commented 8 years ago
I've applied your latest patches (with only minor changes, see below) in r259. 
Thanks!

Minor changes:
- Dropped "oer" from `_contractions`. The contraction is "o'er" is it not? 
Hence it isn't one of the leading-quote-contractions.
- Added a couple `if "'" in text` guards. Similar guards have proved useful for 
performance in other areas. Though really whether this helps or hinders depends 
on the data set.

Original comment by tre...@gmail.com on 17 Jul 2010 at 7:00

GoogleCodeExporter commented 8 years ago
Sounds good.  I'm not exactly sure why I thought o'er had a leading apostrophe, 
but thanks for catching it.

Original comment by nikhil.chelliah on 20 Jul 2010 at 4:00