altmetric / embiggen

A Ruby library to expand shortened URLs
https://rubygems.org/gems/embiggen
MIT License
124 stars 6 forks source link

Remove "to" and "tk" from shorteners? #11

Closed dentarg closed 8 years ago

dentarg commented 9 years ago

I see that http://longurl.org/services and the code have to and tk as domains for URL shortening services. And Dot TK do in fact let you create your own .tk domain to shorten any URL you want.

But not all .tk (and .to?) are URL shortening services, I'm afraid. Here is one example.

$ pry
[1] pry(main)> RUBY_VERSION
=> "2.2.2"
[2] pry(main)> require "embiggen"
=> true
[3] pry(main)> url = "http://crafttalk.tk/index.php/260155-working-with-a-carpeting-cleaner-recommendations-and-suggestion/0"
=> "http://crafttalk.tk/index.php/260155-working-with-a-carpeting-cleaner-recommendations-and-suggestion/0"
[4] pry(main)> Embiggen::URI(url).expand
=> #<URI::Generic /index.php/260155-locate-a-excellent-web-host-with-these-ideas/0>
[5] pry(main)> Embiggen::URI(url).expand!
=> #<URI::Generic /index.php/260155-locate-a-excellent-web-host-with-these-ideas/0>
[6] pry(main)> Embiggen::URI(url).shortened?
=> true

A bit unexpected to only get the path back (a new path). A closer look at that:

[19] pry(main)> uri = URI(Addressable::URI.parse(url).normalize.to_s)
=> #<URI::HTTP http://crafttalk.tk/index.php/260155-working-with-a-carpeting-cleaner-recommendations-and-suggestion/0>
[35] pry(main)> http = ::Net::HTTP.new(uri.host, uri.port)
=> #<Net::HTTP crafttalk.tk:80 open=false>
[36] pry(main)> response = http.head(uri.request_uri)
=> #<Net::HTTPMovedPermanently 301 Moved Permanently readbody=true>
[37] pry(main)> response.is_a?(::Net::HTTPRedirection)
=> true
[38] pry(main)> response.fetch('Location')
=> "/index.php/260155-locate-a-excellent-web-host-with-these-ideas/0"

Not sure what the right thing is to do here, maybe there are people expecting this gem to be able to expand arbitrary .to and .tk domains.

Maybe it is possible to detect that we only got a path from the Location header, and append scheme, and host (and port if non standard). But then #shortened? isn't really true, in some cases. :)

So the easiest seem to just remove to and tk.

(This issue is slightly related to #2)

mudge commented 9 years ago

Hi Patrik,

Sorry for the delay (GitHub notification fail): the example you show—http://crafttalk.tk/index.php/260155-working-with-a-carpeting-cleaner-recommendations-and-suggestion/0—is very strange (but sadly seems to have since been suspended).

Did performing a HEAD request to that URL redirect to somewhere else unexpectedly? If the URL wasn't shortened, I would have expected it to fail gracefully with expand and raise a BadShortenedURI with expand!.

dentarg commented 9 years ago

@mudge there are two things here

number one, I'm asking if you really mean to treat every .to and .tk domain as a redirect service?

the second thing I'm trying to illustrate is that embiggen just tries to make an URI of whatever it get backs from the Location header, even if it's not a full URL (in my case above it was an relative URL) – you can't do anything useful with that URI object, there is no hostname to connect to

mudge commented 9 years ago

Thinking about this, if we remove .to and .tk from the list then we won't attempt to expand any shortened links to URLs on those domains (and the latter certainly offers arbitrary domains).

Perhaps it would be better to tackle the strange behaviour you saw above where a HEAD request resulted in a relative path in the Location.

mudge commented 8 years ago

Fixed in 1.0.0 where an invalid redirect URI (viz. a non-absolute URI) will raise a BadShortenedURI error.