Closed Flaburgan closed 9 years ago
Can confirm, see e.g.: https://joindiaspora.com/posts/2996798
If I post the link in plain Open Graph works.
Also had one of those... https://pod.geraspora.de/posts/1503401
This affects oEmbed too.
I managed to cause 404 in the following two [easily reproducible] cases:
I just want to confirm the bug: https://diaspora.net.gr/posts/145440 https://diaspora.net.gr/posts/145433
More explanation in https://github.com/diaspora/diaspora/issues/4987#issuecomment-45766138
Hey @khall I see you're on fire (:D), it would be really nice to fix this issue. If you don't know on which issue you could work, it's usually a good idea to check the confirmed tag.
@augierLe42e not sure your fix is the correct way to deal with the problem. Look at that post: https://diaspora-fr.org/posts/626628 it also contains a ) at the end of the url, but is not markdown. The url is correctly set, so you should check how the URL is extract and use this method to get it to opengraph.
@Flaburgan : if the URL ends with a ), so the entire expression should end with two brackets. The regular expression should only remove the last one. The only problematic case is if the URL ends with ) but the markdown expression is malformed which, one way or another will result as a problem. I think it is a extrmely rare case, but the only way I see preventing this is that I specificly check that the trailing bracket belong to the markdown and not the URL by checking the URL with bracket exists and the one without doesn't.
Ok, I didn't do a good analysis of what you said. Yes, I have to admit I didn't think about this precise situation... Correct me if I'm wrong, but bracket aren't allowed characters in URL, right ? So if there's a trailing bracket, it must be removed, in all cases ?
They are allowed if escaped and people post them unescaped. I think they're even allowed unescaped in fact, most browsers just escape them. So we need to handle it. The closing bracket case is easily resolved though by using redcarpets stripdown renderer, once a new version of redcarpet is released.
I wondering if we should pull in the twitter-text-rb gem which has a much more robust link extraction utility that together with the stripdown renderer would probably solve all our link extraction issues.
It's up to someone more experienced than me. When a decision is made, just tell me what to do and I'll make another PR ;)
My point is, in the post linked above as an example, the url is correctly turned into a link (without the closing ) in it), so we already have "something" which transform plain text in correct URL and we should use that instead of another regex to extract the url of the post.
@Flaburgan : It might be possible that the URL is displayed into a link by the browser itself, isn't it ?
it looks like when a link is made with the markdown syntax, opengraph keeps the ) at the end of the url. The url is then broken.