firecat53 / urlscan

Mutt and terminal url selector (similar to urlview)
GNU General Public License v2.0
214 stars 37 forks source link

Incorrect inclusion of a surrounding close-parenthesis #45

Closed Boruch-Baum closed 6 years ago

Boruch-Baum commented 6 years ago

The attached text file is an excerpt of one of several weekly emails that sites of the stackexchange.com network distribute. Notice how in the text, each url is enclosed in parentheses, in the form (url). Currently, urlscan parses the trailing, close parenthesis, as part of the URI (ref: issue #9), which causes browsers to 404.

I see a few possible solutions, none great:

  1. Always exclude a closing parenthesis if it is the final character of a URI. This poses the risk of breaking URIs that want a trailing ")".

  2. Perform solution 1 only if an open parenthesis immediately preceded the URI. This would catch stackexchange style formatting, but not hypothetical cases of a url at the end of a parenthetical phrase (something like this http://example.com).

  3. Keep a running count of un-closed parentheses, on a paragraph-by-paragraph basis, and if that value is exactly one, exclude the closing parenthesis at the end of a URL. This would parse incorrectly for the remote cases of nested parentheses and multi-paragraph parenthetical text.

urlscan_test.txt

firecat53 commented 6 years ago

I think you might be running an older version of urlscan. I just tried your sample file on my machine and it's parsing the URLs just fine. Try installing the most recent stable (0.8.6) or git master. I'm pretty sure I corrected this (mostly) with 263787ec4 (discussion in #27). Let me know if that works for you!

Boruch-Baum commented 6 years ago

That could be it. The latest version in the debian repositories is 0.8.2. The package maintainer is listed as "David Carlos de Araujo Silva" - are you in contact with him? If not, I'm willing to e-mail him and ask him to update the package. Debian also lists "Daniel Burrows" as the packager for an even older version.

If all else fails, I'll do an unprivileged local install, and hope to remember to delete it when it becomes superceded by a newer package.

firecat53 commented 6 years ago

I emailed 'ddavidcarlos1392@gmail.com' when I released 0.8.5 but never got a response. You are welcome to try again! Daniel Burrows hasn't been active for many years.

In my opinion, the easiest way to do a single user install is pip install --user urlscan. You only have to make sure that ~/.local/bin is in your $PATH.

Thanks for your interest!

Boruch-Baum commented 6 years ago

I just sent David a message, and last night I performed a local install, so I can confirm that the issue is fixed with a current release and can be closed. Thanks for the package, and the support.