bobtfish / text-markdown

The perl Text::Markdown CPAN module
Other
50 stars 21 forks source link

Fwd: Entity conversion glitch? #3

Open dandv opened 15 years ago

dandv commented 15 years ago

Migrated from https://rt.cpan.org/Ticket/Display.html?id=32951

Tue Feb 05 14:42:30 2008 BOBTFISH - Ticket created

Subject:    Fwd: Entity conversion glitch?
Date:   Tue, 5 Feb 2008 19:41:18 +0000
To: bug-Text-Markdown@rt.cpan.org
From:   Tomas Doran <bobtfish@bobtfish.net>

Begin forwarded message:

> From: Petite Abeille <petite.abeille@gmail.com>
> Date: 3 February 2008 17:22:31 GMT
> To: markdown-discuss@six.pairlist.net
> Subject: Entity conversion glitch?
> Reply-To: "Discussion related to Markdown." <markdown- 
> discuss@six.pairlist.net>
>
> Hello,
>
> Given the following text:
>
> under a license from AT&T; however, others were based on BSD instead.
>
> Daring Fireball's Markdown Dingus produces:
>
> <p>under a license from AT&T; however, others were based on BSD 
> instead.</p>
>
> Note how the '&' is not escaped to '&amp;'.
>
> Bug? Feature?
>
> Thanks in advance.
>
> Kind regards,
>
> PA.
> _______________________________________________
> Markdown-Discuss mailing list
> Markdown-Discuss@six.pairlist.net
> http://six.pairlist.net/mailman/listinfo/markdown-discuss

Tue Feb 05 14:57:31 2008 BOBTFISH - Correspondence added

waylan@gmail.com said:

I would say this is a minor bug, and an interesting edge case as well.
The cause is the semi-colon following the `T`. `&T;` looks like an
html entity. If you remove the semicolon, the `&` is properly
converted to `&amp;`.

In python-markdown, you can escape the `&` and you will get the expected
output:

under a license from AT\&T; however, others were based on BSD instead.

becomes:

<p>under a license from AT&amp;T; however, others were based on BSD
instead.
</p>

Unfortunately, this does not seem to work in the other implementations.

Tue Feb 05 14:57:45 2008 BOBTFISH - Status changed from 'new' to 'open'

Tue Feb 05 15:03:13 2008 BOBTFISH - Requestor petite.abeille@gmail.com added

Tue Feb 05 15:04:18 2008 BOBTFISH - Requestor BOBTFISH deleted

Tue Feb 05 15:05:37 2008 BOBTFISH - Taken

Tue Feb 05 15:10:25 2008 BOBTFISH - Correspondence added

Hiya

I've added this as a bug to Text::Markdown on rt.cpan.org from your
Markdown mailing list post.

Are you using original Markdown.pl, one of the CPAN versions, or a
non-perl version? Are you using the perl version (any of them), and so
is this relevant to you - or would you like me to remove you from the
ticket requestor field?

Tue Feb 05 15:16:16 2008 petite.abeille@gmail.com - Correspondence added

Subject:    Re: [rt.cpan.org #32951] Fwd: Entity conversion glitch? 
Date:   Tue, 5 Feb 2008 21:15:39 +0100
To: bug-Text-Markdown@rt.cpan.org
From:   Petite Abeille <petite.abeille@gmail.com>

Download (untitled) [text/plain 752b] 
Hi Tomas,

On Feb 5, 2008, at 9:10 PM, Tomas Doran via RT wrote:

> <URL: http://rt.cpan.org/Ticket/Display.html?id=32951 >
>
> I've added this as a bug to Text::Markdown on rt.cpan.org from your
> Markdown mailing list post.
>
> Are you using original Markdown.pl, one of the CPAN versions, or a
> non-perl version? Are you using the perl version (any of them),

Nope. Using Niklas Frykholm's Lua implementation:

http://www.frykholm.se/files/markdown.lua

> and so is this relevant to you - or would you like me to remove you 
> from the
> ticket requestor field?

Considering that the perl implementation is the 'canonical' one, 
people tend to mimic its, hmmm, idiosyncrasies :)

You can keep me posted.

Thanks for filing a bug :)

Cheers,

PA.

Tue Feb 05 15:29:03 2008 BOBTFISH - Correspondence added

Subject:    Re: [rt.cpan.org #32951] Fwd: Entity conversion glitch? 
Date:   Tue, 5 Feb 2008 20:26:49 +0000
To: bug-Text-Markdown@rt.cpan.org
From:   Tomas Doran <bobtfish@bobtfish.net>

On 5 Feb 2008, at 20:16, petite.abeille@gmail.com via RT wrote:
>> Are you using original Markdown.pl, one of the CPAN versions, or a
>> non-perl version? Are you using the perl version (any of them),
>
> Nope. Using Niklas Frykholm's Lua implementation:
>
> http://www.frykholm.se/files/markdown.lua

Cool, I've heard a lot of good things about the lua, I must play one 
day.

> Considering that the perl implementation is the 'canonical' one,
> people tend to mimic its, hmmm, idiosyncrasies :)

Yep :)

I maintain Text::Markdown and Text::MultiMarkdown on CPAN.

I've got Text::MultiMarkdown to the point where can passes John's 
entire test suite + more of MultiMarkdown's test suite than the 
'real' MultiMarkdown passes, and I'm planning to merge the two 
modules shortly..

I'm also currently trying different ways of implementing the markdown 
parser, as I think that it can be:

a) A helluva lot quicker
b) Smarter, to deal with edge cases like this and the list item 
example posted recently to the list.
c) Pluggable - everyone and their dog wants a 'markdown like' 
language - but no two groups want exactly the same feature set. 
Rather than go the way of having a zillion 'options' (which is gonna 
make your code suck at best + still be inflexible), I'm trying to 
build a modular parser, so that people who care to do so can plug 
together various markdown (and/or other elements) and produce a 
markdown like language that works for them..

> You can keep me posted.

Will do :)

Cheers
Tom

Tue Mar 04 20:40:37 2008 JOEY - Correspondence added

Speaking of speed, I did this benchmark of 3 versions. The 1.0.1 and
1.0.2b8 versions are the non-cpan perl versions of markdown. I
benchmarked simply markdowning "foo".

markdown 1.0.1
markdown: 3 wallclock secs ( 3.37 usr + 0.02 sys = 3.39 CPU) @
2949.85/s (n=10000)

markdown 1.0.2~b8
markdown: 6 wallclock secs ( 6.08 usr + 0.00 sys = 6.08 CPU) @
1644.74/s (n=10000)

CPAN Text::Markdown 1.0.16
markdown: 11 wallclock secs (10.48 usr + 0.01 sys = 10.49 CPU) @
953.29/s (n=10000)

Wed Mar 05 04:03:54 2008 BOBTFISH - Correspondence added

Subject:    Re: [rt.cpan.org #32951] Fwd: Entity conversion glitch? 
Date:   Wed, 5 Mar 2008 09:03:09 +0000
To: bug-Text-Markdown@rt.cpan.org
From:   Tomas Doran <bobtfish@bobtfish.net>

Download (untitled) [text/plain 434b] 

On 5 Mar 2008, at 01:40, Joey Hess via RT wrote:
>
> CPAN Text::Markdown 1.0.16
> markdown: 11 wallclock secs (10.48 usr + 0.01 sys = 10.49 CPU) @
> 953.29/s (n=10000)
>

From profiling Text::Markdown, *most* of the time is spent in 
Text::Balanced, which does the HTML Entity conversion.

This was introduced in the latest version of Markdown that John 
wrote, to fix a number of the bugs with the HTML encoding.

Cheers
Tom