Awkee / pandoc

Automatically exported from code.google.com/p/pandoc
GNU General Public License v2.0
0 stars 0 forks source link

hsmarkdown / pandoc convert " " to "+" in [img] syntax #220

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?

echo "\![test image](/a b/c.jpg)" | hsmarkdown

What is the expected output? What do you see instead?

expected: markdown.pl gives:

<p>!<a href="/a b/c.jpg">test image</a></p>

I see: hsmarkdown gives:

<p
><img src="/a+b/c.jpg" alt="test image"
   /></p
>

What version of the product are you using? On what operating system?

pandoc-1.4, mac / linux both reproducible

Please provide any additional information below.

none

Original issue reported on code.google.com by nfjinj...@gmail.com on 25 Feb 2010 at 2:31

GoogleCodeExporter commented 9 years ago
btw, both mac and linux on GHC 6.12.1

Original comment by nfjinj...@gmail.com on 25 Feb 2010 at 2:33

GoogleCodeExporter commented 9 years ago
oops, I just realized that the result from markdown is not correct neither. I 
found this 
issue when migrating my blog, which uses pandoc as a rendering engine. Having a 
space inside an image url used to give a "correct" link, but not anymore. I'm 
not sure 
it's the latest version of pandoc that changes this behavior.

Original comment by nfjinj...@gmail.com on 25 Feb 2010 at 2:40

GoogleCodeExporter commented 9 years ago
more info... this is what worked for me, when using pandoc-1.2.1

echo "\![test image](/a b/c.jpg)" | pandoc

<p
><img src="/a%20b/c.jpg" alt="test image"
   /></p
>

Original comment by nfjinj...@gmail.com on 25 Feb 2010 at 2:49

GoogleCodeExporter commented 9 years ago
I made this change because I was under the impression that browsers
treated a '+' character as equivalent to '%20'.  THat is certainly the case
with the browsers I use, and it is fairly standard to use + for spaces in
URLs.  Does it not work in your browser?

Original comment by fiddloso...@gmail.com on 25 Feb 2010 at 2:01

GoogleCodeExporter commented 9 years ago
http://stackoverflow.com/questions/1211229/in-a-url-should-spaces-be-encoded-
using-20-or
suggests that *officially*, you should use %20 in the path part of the URL, and 
+ in the 
query part. I didn't realize this, and I haven't yet seen any bad effects of 
encoding 
spaces as + in both parts.  So it would be useful to know what is choking on 
this...

Original comment by fiddloso...@gmail.com on 25 Feb 2010 at 2:08

GoogleCodeExporter commented 9 years ago
Thanks for the info, here's what I've been experiencing,

Copying straightly from one of my post:

![11.png](/images/album/10-02-08 Rika game engine preview 3/11.png)

gives:

http://jinjing.funkymic.com/images/album/10-02-
08+Rika+game+engine+preview+3/11.png

which is not accessible, yet

http://jinjing.funkymic.com/images/album/10-02-
08%20Rika%20game%20engine%20preview%203/11.png

is.

Original comment by nfjinj...@gmail.com on 25 Feb 2010 at 2:49

GoogleCodeExporter commented 9 years ago
Yes, I can reproduce the bug. (And, just to make sure it's not a peculiarity of 
nginx, I 
reproduced it on apache too.)

OK, I'll modify pandoc to revert to the old behavior with %20 instead of +.
In the mean time, you should be able to use %20 directly in your markdown URLs; 
pandoc knows not to escape the %.

Original comment by fiddloso...@gmail.com on 25 Feb 2010 at 4:24

GoogleCodeExporter commented 9 years ago
Thanks for accepting.

Original comment by nfjinj...@gmail.com on 25 Feb 2010 at 7:00

GoogleCodeExporter commented 9 years ago
Fixed in r1847.  I think I've got it right -- markdown should allow you to use 
exact
URLs, with things like %20, but also to use special characters and spaces 
without
escaping them.  This patch makes this happen; the link source is first 
unescaped (in
case they've used proper escapes) and then escaped (in case they haven't). Let 
me
know if it doesn't work properly for you.

Original comment by fiddloso...@gmail.com on 27 Feb 2010 at 3:09

GoogleCodeExporter commented 9 years ago
I confirm that it fixes for non-unicode url.

But for unicode, it seems to break:

e.g. After applying this patch:

This works:

![11.png](/images/album/10-02-08 Rika game engine preview 3/11.png)
->
http://jinjing.funkymic.com/images/album/10-02-
08%20Rika%20game%20engine%20preview%203/11.png

This does not: (not quite work safe, too lazy to make a test case)

![1.jpg](/images/album/10-02-16 舞 Hime/1.jpg)
->
http://jinjing.funkymic.com/images/album/10-02-16%20%821E%20Hime/1.jpg

this fix it:

http://jinjing.funkymic.com/images/album/10-02-16%20舞%20Hime/1.jpg

I confirm it's not nginx specific, since I get the same behavior locally 
without a 
reverse proxy.

The problem is that this "舞" char is url-escaped. I don't quite understand 
what's 
going on here, should it not?

A trivial patch (attached) that pre-escape string to utf-8 seems to fix the 
problem 
from my smoke test.

Regards

Original comment by nfjinj...@gmail.com on 27 Feb 2010 at 4:23

Attachments:

GoogleCodeExporter commented 9 years ago
I tried a simpler approach in r1851 -- just changing spaces into %20, and not 
messing 
with anything else.  Does that work better?

Original comment by fiddloso...@gmail.com on 27 Feb 2010 at 5:02

GoogleCodeExporter commented 9 years ago
Yep, works pretty well for me :)

Original comment by nfjinj...@gmail.com on 27 Feb 2010 at 5:17