diaspora / diaspora

A privacy-aware, distributed, open source social network.
https://diasporafoundation.org/
GNU Affero General Public License v3.0
13.38k stars 2.92k forks source link

oEmbed/Opengraph and markdown => 404 #4416

Closed Flaburgan closed 9 years ago

Flaburgan commented 11 years ago

it looks like when a link is made with the markdown syntax, opengraph keeps the ) at the end of the url. The url is then broken.

torsti commented 11 years ago

Can confirm, see e.g.: https://joindiaspora.com/posts/2996798

If I post the link in plain Open Graph works.

Faldrian commented 11 years ago

Also had one of those... https://pod.geraspora.de/posts/1503401

jhass commented 11 years ago

This affects oEmbed too.

ciubotaru commented 11 years ago

I managed to cause 404 in the following two [easily reproducible] cases:

kapcom01 commented 10 years ago

I just want to confirm the bug: https://diaspora.net.gr/posts/145440 https://diaspora.net.gr/posts/145433

jhass commented 10 years ago

More explanation in https://github.com/diaspora/diaspora/issues/4987#issuecomment-45766138

Flaburgan commented 10 years ago

Hey @khall I see you're on fire (:D), it would be really nice to fix this issue. If you don't know on which issue you could work, it's usually a good idea to check the confirmed tag.

Flaburgan commented 10 years ago

@augierLe42e not sure your fix is the correct way to deal with the problem. Look at that post: https://diaspora-fr.org/posts/626628 it also contains a ) at the end of the url, but is not markdown. The url is correctly set, so you should check how the URL is extract and use this method to get it to opengraph.

ghost commented 10 years ago

@Flaburgan : if the URL ends with a ), so the entire expression should end with two brackets. The regular expression should only remove the last one. The only problematic case is if the URL ends with ) but the markdown expression is malformed which, one way or another will result as a problem. I think it is a extrmely rare case, but the only way I see preventing this is that I specificly check that the trailing bracket belong to the markdown and not the URL by checking the URL with bracket exists and the one without doesn't.

ghost commented 10 years ago

Ok, I didn't do a good analysis of what you said. Yes, I have to admit I didn't think about this precise situation... Correct me if I'm wrong, but bracket aren't allowed characters in URL, right ? So if there's a trailing bracket, it must be removed, in all cases ?

jhass commented 10 years ago

They are allowed if escaped and people post them unescaped. I think they're even allowed unescaped in fact, most browsers just escape them. So we need to handle it. The closing bracket case is easily resolved though by using redcarpets stripdown renderer, once a new version of redcarpet is released.

I wondering if we should pull in the twitter-text-rb gem which has a much more robust link extraction utility that together with the stripdown renderer would probably solve all our link extraction issues.

ghost commented 10 years ago

It's up to someone more experienced than me. When a decision is made, just tell me what to do and I'll make another PR ;)

Flaburgan commented 10 years ago

My point is, in the post linked above as an example, the url is correctly turned into a link (without the closing ) in it), so we already have "something" which transform plain text in correct URL and we should use that instead of another regex to extract the url of the post.

ghost commented 10 years ago

@Flaburgan : It might be possible that the URL is displayed into a link by the browser itself, isn't it ?

millette commented 9 years ago

As a workaround, I found adding a question mark at the end of the URL fixed the problem (if the URL didn't already have a query). So instead of test I use test