altmetric / embiggen

A Ruby library to expand shortened URLs
https://rubygems.org/gems/embiggen
MIT License
124 stars 6 forks source link

expand every URI #28

Closed alepore closed 7 years ago

alepore commented 7 years ago

Hello!

I saw this example on the README

# Custom logic to attempt to expand every URI
class ExpandEverything
  def self.include?(_uri)
    true
  end
end

Embiggen.configure do |config|
  config.shorteners = ExpandEverything
end

But i think it's not working, because Embiggen::URI doesn't stop on 200 responses, it just keep trying redirects until the domain is not on Configuration.shorteners.

Am i right? Would be super helpful to have something like Configuration.shorteners = false and just follow all redirects until 200 is returned.

mudge commented 7 years ago

Hi @alepore,

You're right that the example is misleading: it will attempt to expand every URL and will inevitably raise a BadShortenedURI as it never stops attempting to redirect.

e.g. the following code raises with following https://www.altmetric.com/products/free-tools/bookmarklet/ did not redirect (Embiggen::BadShortenedURI):

require 'embiggen'

class ExpandEverything
  def self.include?(_uri)
    true
  end
end

Embiggen.configure do |config|
  config.shorteners = ExpandEverything
end

Embiggen::URI('http://altmetric.it').expand

Let me see if there's a way to achieve what you want (attempt to expand all URLs but gracefully handle 200s) with the current API.

mudge commented 7 years ago

Hi @alepore,

Unfortunately, what you want to do is tricky with Embiggen's current design as we need to know without first visiting a link whether to follow it or not. This is due to our use case at @altmetric where we process thousands of links very rapidly and it would be far too expensive to make HTTP requests to them all.

Perhaps instead it'd be simpler to use an HTTP client library which automatically follows redirects for you and use that instead?

e.g.

With Typhoeus:

[6] pry(main)> Typhoeus.get('http://www.altmetric.it', followlocation: true).effective_url
=> "https://www.altmetric.com/products/free-tools/bookmarklet/"
[7] pry(main)> Typhoeus.get('https://www.altmetric.com', followlocation: true).effective_url
=> "https://www.altmetric.com/"

With Rest Client:

[16] pry(main)> RestClient.get('http://www.altmetric.it').request.url
=> "https://www.altmetric.com/products/free-tools/bookmarklet/"
[17] pry(main)> RestClient.get('https://www.altmetric.com').request.url
=> "https://www.altmetric.com"
alepore commented 7 years ago

@mudge thank you for reply and suggestions!