altmetric / embiggen

A Ruby library to expand shortened URLs
https://rubygems.org/gems/embiggen
MIT License
124 stars 6 forks source link

Use a sum type instead of exceptions #1

Closed mudge closed 9 years ago

mudge commented 9 years ago

The reason we have two separate expansion methods (Embiggen::URI#expand and Embiggen::URI#expand!) is to communicate the following things to the user:

At the same time, we want to provide a graceful API which tolerates failures for users that don't mind if URI expansion fails (and so doesn't unnecessarily raise exceptions).

Rather than having two separate methods (one graceful, one exception-happy), we could instead have a single Embiggen::URI#expand method which returns a sum type to communicate some of the above without disrupting control flow:

Embiggen::URI('http://www.altmetric.com').expand
#=> #<ExpandedURI http://www.altmetric.com>

Embiggen::URI('http://toomanyredirec.ts').expand
#=> #<ShortenedURI http://toomanyredirec.ts>

Embiggen::URI('http://badshortenedu.rl').expand
#=> #<BadShortenedURI http://badshortenedu.rl>

These types could wrap a standard Ruby URI object so they can be used seamlessly with other libraries.

This would then be graceful (viz. failure to expand would not raise an exception or return nil) but users could check the return value if they wanted to ensure expansion was successful (we could provide a Embiggen::URI#expanded? helper rather than switching on type).

scottmatthewman commented 9 years ago

:+1: Really like this idea. Feels like a much more elegant way of expressing resultant behaviour. Not to mention that relying on exceptions for control flow is slow (see slides 47-8 of Erik Michaels-Ober's talk on Writing Fast Ruby), and when resolving URLs we tend to want to be as quick as possible.

tomstuart commented 9 years ago

What does the code that cares about these cases look like?

mudge commented 9 years ago

With the current code, users that want to handle too many redirects themselves (e.g. because they don't mind doing a few more HEAD requests for important URIs) would have to do something like so:

an_important_uri = Embiggen::URI('http://examp.le')
begin
  an_important_uri.expand!
rescue Embiggen::TooManyRedirects
  an_important_uri.expand!(redirects: 10)
end

(Though you could argue they should just use a higher redirects threshold:)

if important_uri?
  Embiggen::URI(uri).expand(redirects: 10)
else
  Embiggen::URI(uri).expand
end

More generally, you can't easily tell using expand whether the expansion was successful or not as it swallows any exceptions and always tries to return a useful URI.

It also bothers me that expand is implemented in terms of expand! and does so by catching the various exceptions it throws: https://github.com/altmetric/embiggen/blob/master/lib/embiggen/uri.rb#L14-L18

Ruby's own Net::HTTP is a prominent example of using sum types (e.g. Net::HTTPSuccess, Net::HTTPRedirection) but this leads to client usage that is otherwise unfamiliar in Ruby:

uri = Embiggen::URI('http://examp.le').expand
case uri
when ShortenedURI
  # Consider expanding it further...
when ExpandedURI
  # All is well
end

@scottmatthewman made the point that switching on the types themselves is quite nasty: after all, we don't care what they are, just how they respond to messages. With this in mind, whether a URI is fully expanded or not is still useful as a URI (you could still decide to GET it when still shortened, etc.) and maybe the only real exception is if expansion fails due to a network problem.

Perhaps expand shouldn't be so aggressive in swallowing network exceptions (so that legitimate failures are more explicit) but could return either a ShortenedURI or ExpandedURI (that both respond to shortened? and expanded?) for the multiple redirect case:

begin
  uri = Embiggen::URI('http://examp.le').expand

  if uri.shortened?
    logger.warn "#{uri} was still shortened even after expansion!"
  end
rescue Embiggen::ExpandError => e
  logger.error "Could not expand! #{e.message}"
end

Without fortune telling too much, having our own return types would also allow us to add the individual hops to the return type (so you could see the various redirects).

ests commented 9 years ago

Don't you think that your last example reads slightly weird? You tell URI to expand, and in the next line you ask if it's shortened? Why not to ask for success (expanded?)?

I also agree that bang methods are overused, in the sense, you have to think first - does it mean that I mutate object state (Ruby style) or that I can get an exception (AR style)?

When I would ask for URI to have been expanded, I would just expect it to work, and in case any problems (redirects, timeout, network) I just need to display "a friendly" error to user (simple "Your URI cannot be expanded at the moment"). So another idea (to avoid case jugling) is to return two types only: a normal succes URI, and failed URI like NullObject. You could dig further into detailed errors, by asking this null object (f.i. NullURI#erorrs ect.).

mudge commented 9 years ago

@ests success? is a good idea. I was looking at how Typhoeus deals with errors and it also avoids exceptions in favour of a consistent return value that can be interrogated for more information via methods like success? and timed_out?.

Re bang methods, I generally try to follow Matz' advice:

The bang (!) does not mean "destructive" nor lack of it mean non destructive either. The bang sign means "the bang version is more dangerous than its non bang counterpart; handle with care".

I'll experiment with Embiggen::URI actually wrapping (viz. delegating to) a standard library URI and exposing a single expand method that is guaranteed to return another Embiggen::URI like so:

uri = Embiggen::URI('http://examp.le').expand
#=> #<Embiggen::URI http://examp.le>
uri.success? #=> true if expanded, false if something went wrong during expansion
uri.shortened? #=> true if shortened
uri.expanded? #=> opposite of shortened?

You could then call expand on the return (as it is still an Embiggen::URI) which will return self if it is already expanded or try expanding further. We could potentially add other methods like timed_out? or error, etc. to explain why an expansion wasn't successful.

mudge commented 9 years ago

After discussion, the title of this issue was a misstep to begin with: the problem isn't the absence of sum types, it's the strange API of having both expand and expand!.

I've since changed my mind about how aggressively Embiggen should swallow exceptions raised outside of its control (e.g. network errors while following redirects). This resulted in the more conservative changes in #10.

There's probably a lesson in here about API design: for now we're trying to contain things under our control and return standard library types (e.g. URIs) where possible.

mudge commented 9 years ago

Merged into 1.x.