julep: "chain of custody" error handling

JuliaLang / julia

The Julia Programming Language

https://julialang.org/

MIT License

45.54k stars 5.47k forks source link

julep: "chain of custody" error handling #7026

Open StefanKarpinski opened 10 years ago

StefanKarpinski commented 10 years ago

A of the major issues with try/catch is that it's possible to catch the wrong error and think that you are handling a mundane, expected problem, when in fact something far worse is wrong and the problem is an unexpected one that should terminate your program. Partly for this reason, Julia discourages the use of try/catch and tries to provide APIs where you can check for situations that may cause problems without actually raising an exception. That way try/catch is really left primarily as a way to trap all errors and make sure your program continues as best it can.

This is also why we have not introduced any sort of typed catch clause into the language. This proposal introduces typed catch clauses, but they only catch errors of that type under certain circumstances in a way that's designed to avoid the problem of accidentally catching errors from an unexpected location that really should terminate the program. Here's what a Java-inspired typed catch clause might look like in Julia:

try
    # nonsense
    foo(1,2,3) # throws BazError
    # shenanigans
catch err::BazError
    # deal with BazError only
end

Of course, we don't have this in Julia, but it can be simulated like this:

try
    # nonsense
    foo(1,2,3) # throws BazError
    # shenanigans
catch err
    isa(err, BazError) || rethrow(err)
    # deal with BazError only
end

Although this does limit the kinds of errors one might catch, it fundamentally doesn't solve the problem, which is that while there are some places where you may be expecting a BazError to be thrown, other code that you're calling inside the try block might also throw a BazError where you didn't expect it, in which case you shouldn't try to handle because you'll just be covering up a real, unexpected problem. This problem is especially bad in the generic programming contexts where you don't really know what kind of code a given function call might end up running.

To address this, I'm proposing adding a hypothetical throws keyword that allows you to annotate function calls with a type that they're expected to throw and which you can catch with a type catch clause (it's a little more subtle than this, but bear with me). The above example would be written like this:

try
    # nonsense
    foo(1,2,3) throws BazError
    # shenanigans
catch err::BazError
    # deal with *expected* BazError only
end

The only difference is that the throws BazError comment after the call to foo became syntax. So the question is what does this annotation do? The answer is that without that annotation indicating that you expect the expression to throw an error of type BazError, you can't catch it with a typed catch. In the original version where the throws BazError was just a comment, the typed catch block would not catch a BazError thrown by the call to foo – because the lack of annotation implies that such an error is unxepected.

There is a bit more, however: the foo function also has to annotate the point where the BazError might come from. So, let's say you had these two definitions:

Consider something like this:

function foo1(x,y,z)
    bar(x,2y) throws BazError
end

function foo2(x,y,z)
    bar(x,2y)
end

function bar(a,b)
    throw BazError()
end

This also introduce a hypothetical keyword form of throw – more on that below. These definitions result in the following behavior:

try
    # nonsense
    foo1(1,2,3) throws BazError
    # shenanigans
catch err::BazError
    # error from bar is caught
end

try
    # nonsense
    foo2(1,2,3) throws BazError
    # shenanigans
catch err::BazError
    # error from bar is NOT caught
end

The rule is that a typed catch only catches the an error if every single call site from the current function down to the one that actually throws the error is annotated with throws ErrorType where the actual thrown error object is an instance of ErrorType. An untyped catch still catches all errors, leaving the existing behavior unchanged:

try foo1(1,2,3)
catch
    # error is caught
end

try foo2(1,2,3)
catch
    # error is caught
end

So what is the rationale behind this proposal and why is it any better than just having typed catch clauses? The key point is that in order to do a typed catch, there has to be a "chain of custody" from the throw site all the way up to the catch site, and each step has to expect getting the kind of type that's thrown. There are two ways to catch the wrong error:

catch an error that was not the type of error you were expecting
catch an error that came from a place you weren't expecting it to come from

The first kind of mistake is prevented by giving an error type, while the second kind of mistake is prevented by the "chain of custody" between the throw site and the catch site.

Another way of thinking about this is that the throws ErrorType annotations are a way of making what exceptions a function throws part of its official type behavior. This is akin to how Java puts throws ErrorType after the signature of the function. But in Java it's a usability disaster because you are required to have the throws annotation as soon as you do something that could throw some kind of non-runtime exception. In practice, this is so annoying that it's pretty common to just raise RuntimeErrors instead of changing all the type signatures in your whole system. Instead of making runtime excpetion vs. non-runtime exception a property of the error types as Java does, this proposal makes it a property of how the error occurs. If an error is occurs in an expected way, then it's a non-runtime exception and you can catch it with a typed catch clause. If an error occurs in an unexpected way, then it's a runtime exception and you can only catch it with an untyped catch clause. You can only catch errors by type if they are part of the "official behavior" of the function you're calling, where official behavior is indicated with throws ErrorType annotations.

One detail of this proposal that I haven't mentioned yet is that throw would become a keyword instead of just a function. The reason for this is that writing

throw(BazError()) throws BazError

seems awfully verbose and redundant. For compatibility, we could allow throw() to still invoke the function throw while throw ErrorType(args...) would be translated to

throw(ErrorType(args...)) throws ErrorType

This brings up another issue – what scope should the right-hand-side of throws be evaluated in? In one sense, it really only makes sense to evaluate it in the same scope that the function signature is evalutated in. However, this is likely to be confusing since all other expressions inside of function bodies are evaluated in the local scope of the function. It would be feasible to do this too, and just rely on the fact that most of the time it will be possible to statically evaluate what type expression produces. Of course, when that isn't possible, still be necessary to emit code that will do the right thing depending on the runtime value of the expression. If the r-h-s is evaluated in local scope, then we can say that throws expr is equivalent to this:

throw(expr) throws typeof(expr)

Usually it will be possible to staticly determine what typeof(expr) is, and it is a simpler approach to explain than restricting the syntax of the expression after the throw keyword.

Note that under this proposal, these are not the same:

try
    # stuff
catch
    # catches any error
end

try
    # stuff
catch ::Exception
    # only catches "official" errors
end

Also note that this proposal is, other than syntax additions that are not likely to cause big problems, completely backwards compatible: existing untyped catch clauses continue to work the way they used to – they catch everything. It would only be new typed catch clauses that only catch exceptions that have an unbroken "chain of custody".

tknopp commented 10 years ago

+1 for the syntax addition.

I have to admit that I am not sold on the idea of expected and unexpected exceptions. I would find it rather "unexpected" if I indicate to catch something that is then not catched ;-)

For me the real issue in such situations i that the exception hierarchy is not well designed and/or that the wrong exception is used from the exception hierarchy.

StefanKarpinski commented 10 years ago

The problem with typed exceptions is that they give you a sense of precision, but they're not really precise at all. Making the exception part of the type of a function is a good idea – that part Java got right. What's not so good is forcing the caller to handle every kind of exception a function it uses can throw.

Here's a motivating scenario: let's say you implement a new kind of AbstractVector – say a kind of sparse vector or something – let's call it SpArray. Internally, it stores values in normal Arrays. Someone is using this and finds that sometimes they end up making indexing errors – they use a typed catch block to handle this. But you've made a programming error in implementing the SpArray type and it's actually SpArray that's encountering the indexing error internally because of a mistake. But this is caught and treated as if was a user-level indexing error. With this proposal, that situation won't occur because at the point where the SpArray implementation is incorrect and causes an indexing error, there is no throws BoundsError annotations so the resulting exception can't be caught with a typed catch clause.

tknopp commented 10 years ago

I absolutely see your point that it can lead to hard to find bugs if one try/catches too generic exceptions and I have been running into this myself in C++ a lot. ("Oh this throws exceptions all the time. Put try { ... } catch(...) {}around it and problem solved").

To play around with your example maybe this should be handled by throwing SpArrayBoundsError in SpArray and ArrayBoundsError in Array. Of course both deriving from BoundsError. In this way one can catch the SpArray specific bounds errors and let all other BoundsError pass through.

StefanKarpinski commented 10 years ago

To play around with your example maybe this should be handled by throwing SpArrayBoundsError in SpArray and ArrayBoundsError in Array. Of course both deriving from BoundsError. In this way one can catch the SpArray specific bounds errors and let all other BoundsError pass through.

That's a common solution offered in languages with exception hierarchies and typed catch, but I don't buy it. The logical extreme of that approach is to have an error type for almost every single location where an exception can be thrown. At that point, what you're really doing is labeling individual exception locations and using labels to catch things – albeit labels that are carefully arranged into a hierarchy. That need for a carefully arranged hierarchy to make the approach tenable is also fishy – and easily abused. I came up with this approach by thinking about how to compromise between a broad classification scheme with a set of standard exception types and individual labels for exception locations.

tknopp commented 10 years ago

Yes you are absolutely right. Exception hierarchies have the potential to get very specific up to the point that specific lines get specific error types. But as one subtypes from broader exception types I don't see the issue with that. But maybe I just have bad taste to not think this is fishy :-) So please lets look what others think about this.

JeffBezanson commented 10 years ago

Talking about this as "labeling locations" gets me thinking: would it be possible just to label the locations? Every exception could consist of an exception object and a label (symbol). The SpArray library could throw BoundsErrors, labeling them as :SpBoundsError. You could catch by type and/or label.

aviks commented 10 years ago

You're right that using typed exceptions correctly in Java is annoying. Everyone throws RuntimeException subclases, and I've trained myself to see the Exception type as primarily informational, useful only for logging.

But my fear is, annotating exceptional (?) call sites all the way down is going to turn out to be equally tedious. Is this something that is easy enough to reason about and implement, so that it will be widely practiced?

StefanKarpinski commented 10 years ago

@JeffBezanson – I thought about just labeling locations, but it seems too granular. Sometimes different locations throw the same error, no? Also, how is that different from having an exception type for every location? I guess you wouldn't have to declare the types, but you also wouldn't get any type hierarchy. How would labels be scoped? By module?

JeffBezanson commented 10 years ago

Yes, you'd probably want some kind of scoping, which remains to be worked out. But different locations could in fact throw the same error; in my overly-simple formulation they could just use the same symbol.

The idea is that exception and origin are orthogonal: what and where-from. "What" is the exception type hierarchy, which describes problems, and "where-from" is the locations, which might follow the module hierarchy, or have no hierarchy.

StefanKarpinski commented 10 years ago

@aviks – I think this should actually be pretty rare. Having a "throws" annotation is essentially saying "throwing this error is part of the official API of this function". Currently we basically consider all exceptions to be runtime errors, so anything is real problem and not part of the functions official behavior. That would continue to be the default. Only when you decide that something like BoundsErrors or KeyError is part of the official interface of AbstractArray or Associative would you put throws BoundsErrors and throws KeyError annotations on function calls. And hopefully you wouldn't need very many of them. I should probably try out adding these annotations for something like that and see how it goes.

tknopp commented 10 years ago

The labeling idea is interesting. Still it increases the dimensionality of the dispatch mechanism from a one-dimensional to a two-dimensional thing.

I am all for putting more context into exceptions. In C# for instance one usually also gets the line numbers where the exception was thrown. Further when debugging exceptions I found it very useful to have exception chains, i.e. when an exception is rethrown (as a different exception type) the original exception is put as an "inner exception" to the outer exception.

My own experience is that in small projects and research code exceptions are mainly an error mechanism for debugging "not yet correct" code. In these situations one will hardly use try/catch and reason about what exception to catch. The intensive use of the error function in Julia base is an indicator that this is what most people use exceptions for. The importance for exceptions increased a lot for me when working on large projects and projects where others are the users (i.e. production code). In these cases one has to prevent by all means that exceptions remain uncatched and thus has to maintain a sensible exception hierarchy and use try/catch at various locations. But this is also usually nothing where the system exception hierarchy is used. Instead I want to throw TheUSBCableIsUnplugged exceptions that are carefully catches and translated to ThePrinterIsNotAvailable and so on. And for this purpose the Julia exception system is already working quite well.

samoconnor commented 10 years ago

I am in the process of porting my Tcl AWS library to Julia as a way to learn Julia, (and so that I can then experiment with implementing projects reliant on AWS in Julia).

In a first-pass of the Julia docs, I saw try/catch/finally and made a mental check "that is supported". When it comes to writing actual code I am surprised that try/catch doesn't do what I expect. Most Julia features have either made me think "nice, that's how it should be done", or "that's weird" followed by some reading followed by "that's nice". The design of Julia's try/catch seems to be that it is deliberately somewhat crippled to try to wean people off using it. This smells wrong to me.

In distributed, networked systems individual nodes and connections can fail there are propagation delays, data models are eventually-consistent. Clear, robust, exception handling is a must. I want to write code that assumes everything is reliable, immediately globally updated etc and deal with the exceptional glitches in one place in high-level retry or conflict resolution loops.

I am not a fan of type-based exception handling, having had the misfortune of working with Java etc. Tcl has no types, so exceptions are trapped based by matching an error-code prefix. An error-code in Tcl is just a list.

e.g.

proc get {url} {
    ...
    throw {HTTP 404}  "Error: $url Not Found"
    ...
    throw [list HTTP 300 $location] "Error: $url has moved to $location"
    ...
}

try {

   get $url

} trap {HTTP 404} {} {
    puts "get can't find $url"
} trap {HTTP 300} {message info} {
    set location [lindex $info 2]
    get $location
}

# Ignore missing dict key...
proc lazy_get {dict key} {
    try {

        dict get $mydict "key"

    } trap {TCL LOOKUP DICT} {} {}
}

# Catch exit status of shell command...
retry count 2 {

     exec ./create_queue.tcl "foobar"

 } trap EX_TEMPFAIL {} {
    after 60000
}

Every error (exception) in a Tcl program is supposed to have a human-readable error message and a machine-readable errorcode.

The source of an error can include as much or as little machine-readable info as it likes.

Safe exception handling is achieved by trapping only sufficiently specific errorcodes. Of course you can shoot yourself in the foot by writing "trap {}" if you want to...

http://www.tcl.tk/man/tcl8.6/TclCmd/try.htm http://www.tcl.tk/man/tcl8.6/TclCmd/throw.htm

I think using types to match exceptions isn't great. But since Julia matches methods by type as a core feature, perhaps it makes sense for Julia. I would prefer something like passing a Dict() to throw and having a convenient syntax for catching only exceptions that match a dictionary pattern.

The key thing is readability of the try/catch code.

Below are some fragments of Tcl code that I am face with porting to Julia. I am open to the idea that there is just a better way to do these things in Julia. But on the other hand, I think these are reasonable uses of an exception mechanism...

try {

    set web_user [create_aws_iam_user $aws $region-web-user]

} trap EntityAlreadyExists {} {
    delete_aws_iam_user_credentials $aws $region-web-user
    set web_user [create_aws_iam_user_credentials $aws $region-web-user]
}

    retry count 4 {

        set res [aws_sqs $aws CreateQueue QueueName $name {*}$attributes]

    } trap QueueAlreadyExists {} {

        delete_aws_sqs_queue [aws_sqs_queue $aws $name]

    } trap AWS.SimpleQueueService.QueueDeletedRecently {} {

        puts "Waiting 1 minute to re-create SQS Queue \"$name\"..."
        after 60000
    }

proc fetch {bucket key {byte_count {}}} {

    if {$byte_count ne {}} {
        set byte_count [list Range bytes=0-$byte_count]
    }
    try {

        s3 get $::aws $bucket $key {*}$byte_count

    } trap NoSuchKey {} {
    } trap AccessDenied {} {}
}

proc aws_ec2_associate_address {ec2 elastic_ip} {

    puts "Assigning Elastic IP: $elastic_ip"
    retry count 3 {

        aws_ec2 $ec2 AssociateAddress InstanceId [get $ec2 id] \
                                      PublicIp $elastic_ip

    } trap InvalidInstanceID {} {
        after [expr {$count * $count * 1000}]
    }
}

proc create_aws_s3_bucket {aws bucket} {

    try {

        aws_rest $aws PUT $bucket Content-Type text/plain Content [subst {
                <CreateBucketConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
                  <LocationConstraint>[aws_bucket_region $bucket]</LocationConstraint>
                </CreateBucketConfiguration>
        }]

    } trap BucketAlreadyOwnedByYou {} {}
}

proc delete_aws_sqs_queue {queue} {

    try {

        aws_sqs $queue DeleteQueue

    } trap AWS.SimpleQueueService.NonExistentQueue {} {}
}

proc aws_s3_exists {aws bucket path} {

    try {

        aws_s3_get $aws $bucket $path Range bytes=0-0
        return 1

    } trap NoSuchKey {} {
    } trap AccessDenied {} {}
    return 0
}

    retry count 3 {

        return [$command $aws {*}$args]

    } trap {TCL LOOKUP DICT AWSAccessKeyId} {} {

        set aws [get_aws_ec2_instance_credentials $aws]

    } trap ExpiredToken {message info} {

        if {![exists ::oc_aws_ec2_instance_credentials]} {
            return -options $info $message
        }

        puts "Refreshing EC2 Instance Credentials..."
        set aws [get_aws_ec2_instance_credentials $aws -force-refresh]
    }

    try {

        aws_iam $aws DeleteInstanceProfile InstanceProfileName $name

    } trap NoSuchEntity {} {}

    set response [aws_iam $aws CreateInstanceProfile \
                               InstanceProfileName $name \
                               Path $path]

hayd commented 10 years ago

other code that you're calling inside the try block might also throw a BazError where you didn't expect it

This seems like an edge case, if you keep the try block small (e.g. just the line you're expecting a failure) it shouldn't be the case... if it's critically different why not subclass BazError?

+1, a syntax for typed exceptions would be great. IMO isa(err, BazError) ... is quite messy. I like:

catch err::BazError
    # deal with *expected* BazError only
catch err::FooError
    # deal with *expected* FooError only
end

elextr commented 10 years ago

if you keep the try block small (e.g. just the line you're expecting a failure) it shouldn't be the case...

The syntax proposed by @StefanKarpinski is really just a more compact syntactic sugar for a try block around the important line (or is it intended to apply to an expression?) which passes the exception to the outer catch.

if it's critically different why not subclass BazError?

You can't when its thrown by a library you called, not your code.

The concept of identifying several small regions where "it is understood that this may throw xxxerror, and I'm prepared to handle it in a common handler" seems a good idea, but as currently proposed this is likely to surprise those coming from other languages (who don't RTFM because they expect all languages to be the same :)

This also doesn't provide for one of the traditional use-cases where you want to catch some types of errors no matter where they occur, but let the others go:

 try
    a big fast but flakey algorithm
 catch err::matherror # oh no underflowed, overflowed or divided by zero
    alternative slow algorithm
 end

StefanKarpinski commented 10 years ago

@hayd: This seems like an edge case, if you keep the try block small (e.g. just the line you're expecting a failure) it shouldn't be the case... if it's critically different why not subclass BazError?

Even if the try block only calls a single function, that function can do absolutely anything and throw any kind of exception for many different reasons. In generic code this is particularly bad since you don't really know which implementation of the function you're calling is going to be invoked or how it's implemented. Of course, you shouldn't have to know this to use it correctly – and that's precisely the problem with type-based exception handling: there's massive abstraction leakage between the implementation of the function being trapped and how to trap it. Let's say you're doing try f() catch err::Foo ... end and you happen to know that f only throws the Foo error under certain circumstances and you want to handle that situation. Great. But now, let's say someone changes the implementation of f – or of any function that f calls. The change seems innocuous to them, but they happen to call a function – probably unwittingly – that can throw a Foo error under some obscure circumstances. You will now be trapping this situation even though that's not what you meant to do at all. Their seemingly harmless change has now made your code incorrect.

You could introduce new error types, but that's just completely annoying and impractical. Let's say I add a new subtype of AbstractArray. Since it's array-like, you want to throw IndexError when someone indexes into it wrong. Under the hood, however, it uses normal arrays and indexes into them, which might itself cause an IndexError. In order to distinguish the former condition from the latter, your type needs to introduce a new subtype of AbstractIndexError. This means that every standard error in Base needs to be split into an abstract type and a concrete type per implementation of that type: AbstractIndexError, ArrayIndexError, SparseIndexError, etc. And if you implement a new subtype of AbstractArray, you need your own subtype of AbstractIndexError in order to really correctly implement it. This is almost absurdly unwieldy.

StefanKarpinski commented 10 years ago

@samoconnor – I really don't think try/catch is a good mechanism for handling that kind of networking timeout situation. I'm not sure what a better mechanism is, but this doesn't really feel right. In particular, you probably want to be able to retry things, which try/catch does not handle.

jakebolewski commented 10 years ago

A networking timeout situation is usually handled by a state machine. We have gotos with labels now which makes writing state machines easier.

samoconnor commented 10 years ago

@StefanKarpinski - Re: "retry", note that several of my examples above use a "retry" variant of "try". Viral Shah said in a separate conversation "The short answer to your question is yes, it is easy to do retry with Julia’s macros". So I'm assuming that "retry" can be made to work as a seamless "try" variant in Julia.

Re: "network timeouts", that is not really what my examples above address. I'm dealing with timeouts like this:

    for wait_time in {0.05, 0.5, 5, 0}

        try

            response = process_response(open_stream(uri, request))

            if (response.finished
            &&  response.status >= 200 && response.status < 500)
                return response
            end

        catch e
            if wait_time == 0 || !isa(e, UVError)
                rethrow(e)
            end
        end

        sleep(wait_time)
    end

I have no choice here but to use try/catch because the underlying sockets library throws UVError. The retry loop is safe against catching some unexpected variant of UVError, because after a few attempts it throws the error up anyway.

It might be nice to simplify the catch code a little

catch e::UVError
    wait_time > 0 || rethrow(e)

or maybe

catch e if isa(UVError, e)
    wait_time > 0 || rethrow(e)

... the "if" syntax would the allow:

catch e if isa(UVError, e) && uverrorname(e) == "EAI_NONAME"

Re: typing of exceptions, note that in my examples above there is no mention of trapping timeout exemptions. The things I'm trapping are all well-defined and precise. e.g. EntityAlreadyExists, AWS.SimpleQueueService.QueueDeletedRecently, AccessDenied, ExpiredToken, {TCL LOOKUP DICT AWSAccessKeyId}, BucketAlreadyOwnedByYou.

While it is true that there is code out there that makes a mess of exception meta-data (I've written some android code) there are plenty of APIs, e.g. AWS, that have robust exception identification.

It is essential to have an out-of-band, stack-frame-jumping, exception mechanism in complex code that sits on top of imperfect web services. The alternative is that all of the code is dominated by deciding what error information needs to be passed back up the stack and how far.

As an example of converting HTTP exceptions, In my Julia AWS library I currently do this:

        # Handle other responses as error...
        err = TaggedException({
            "verb"      => r.verb,
            "url"       => r.url,
            "code"      => string("HTTP ", response.status),
            "message"   => response.data
        })

        # Look for JSON error message...
        if response.headers["Content-Type"] == "application/x-amz-json-1.0"
            json = JSON.parse(response.data)
            if haskey(json, "__type")
                err.tags["code"] = split(json["__type"], "#")[2]
            end
        end

        # Look for XML error message...
        if response.headers["Content-Type"] in {"application/xml", "text/xml"}
            xml = parse_xml(response.data)
            if (v = get(xml, "Code")) != nothing
                err.tags["code"] = v
            end
            if (v = get(xml, "Message")) != nothing
                err.tags["message"] = v
            end
        end

and

catch e
    if isa(TaggedExeption,e) && e["code"] == "BucketAlreadyOwnedByYou"
        ...
    else
        rethrow(e)
    end
end

I think I'm too new to Julia to contribute great solutions, but I hope that my real-world examples help in some way...

samoconnor commented 10 years ago

Following up on: "In particular, you probably want to be able to retry things, which try/catch does not handle."

I've managed to figure out enough macro-fu to implement an exception handling retry loop.

On the first n-1 attempts, the catch block has the opportunity to "@retry".

If the catch block does not call @retry, the error is rethrown automatically.

On the nth attempt the try block is executed naked (without a catch block) so errors are passed up.

Example follows...

(please let me know if my posts on this issue are too off-topic, and/or where a better place to post would be.)

# Try up to 4 times to get a token.
# Retry when we get a "Busy" or "Throttle" exception.
# No retry or catch if we get a "Crash" exception (or no exception).

@with_retry_limit 4 try

    println("Trying to get token...")
    println("Token: " * get_token())

catch e

    if e.msg == "Busy"

        println("Got " * string(e) * ", try again...\n")
        @retry

    elseif e.msg == "Throttle"

        println("Backing off...\n")
        sleep(1 + rand())
        @retry
    end
end

# Unreliable operation simulator.
# Often busy, sometimes asks us to back off, sometimes crashes.

function get_token()

    if rand() > 0.2
        error("Busy")
    end

    if rand() > 0.8
        error("Throttle")
    end

    if rand() > 0.9
        error("Crash")
    end

    return "12345"
end

# Implementation of retry

macro with_retry_limit(max::Integer, try_expr::Expr)

    @assert string(try_expr.head) == "try"

    # Split try_expr into component parts...
    (try_block, exception, catch_block) = try_expr.args

    # Insert a rethrow() at the end of the catch_block...
    push!(catch_block.args, :(rethrow($exception)))

    # Build retry expression...
    retry_expr = quote

        # Loop one less than "max" times...
        for i in [1 : $max - 1]

            # Execute the "try_expr".
            # It can do "continue" if it wants to retry...
            $(esc(try_expr))

            # Only get to here if "try_expr" executed cleanly...
            return
        end

        # On the last of "max" attempts, execute the "try_block" naked 
        # so that exceptions get thrown up the stack...
        $(esc(try_block))
    end
end

# Conveniance "@retry" keyword...

macro retry() :(continue) end

samoconnor commented 10 years ago

Here is an attempt to avoid having to put rethrow(e) at the end of a catch block. This is intended both as a way to remove a bit of noise from the code, and as a safety net to avoid accidentally catching unintended exceptions.

I think the thing that made me most scared about Julia's "catch" at first was that everything is caught by default, then you have to be really careful to rethrow() the right exceptions.

#!/Applications/Julia-0.3.0-rc1-a327b47bbf.app/Contents/Resources/julia/bin/julia

# Re-write "try_expr" to provide an automatic rethrow() safety net.
# The catch block can suppress the rethrow() by doing "e = nothing".

macro safetynet(try_expr::Expr)

    @assert string(try_expr.head) == "try"

    (try_block, exception, catch_block) = try_expr.args

    push!(catch_block.args, :(isa($exception, Exception) && rethrow($exception)))

    return try_expr
end

@safetynet try

    d = {"Foo" => "Bar"}

    println("Foo" * d["Foo"])
    println("Foo" * d["Bar"])

    @assert false # Not reached.

catch e

    if isa(e, KeyError) && e.key == "Bar"
        println("ignoring: " * string(e))
        e = nothing
    end
end

println("Done!")

Question about macros, is there any way I can omit the "try" keyword and have the macro insert it for me? e.g.

@safetry
    ...
catch
    ...
end

I can't figure out if this is possible with Julia's macro system.

MikeInnes commented 10 years ago

@StefanKarpinski Just for clarification, I take it that the "chain of custody" works on a per-method basis? So to modify your original example:

function test(x)
  try
    # tomfoolery
    foo(x) throws BazError
    # highjinks
  catch e::BazError
    # deal with BazError
  end
end

function foo(x::Number)
  bar(x) throws BazError
end

function foo(x::String)
  bar(x)
end

function bar(a)
  throw BazError()
end

test(1) catches the error, but test("foo") doesn't?

MikeInnes commented 10 years ago

Also, there's at least one place this would be useful in Base, in the display code. Checking the function that threw the exception improves the situation but is still pretty brittle, and it sounds like this proposal would solve that problem.

StefanKarpinski commented 10 years ago

@one-more-minute, yes, that's how I think it should work. In other words even though foo has BazError as part of its overall interface, a BazError raised by foo(::String) via bar would be considered accidental since it's not annotated. I agree with the sentiment that this is "an edge case", and indeed many languages have gotten by without anything like this. However, the difference between handling edges cases correctly and not quite correctly is a fairly major one.

Also, +1 for "tomfoolery" and "hijinks" :-)

StefanKarpinski commented 10 years ago

@samoconnor – I think that handling this kind of complex I/O error is a really important problem but I also think it's beyond the scope of this issue, which is pretty focused: it solves just the problem of trapping the right errors and not accidentally trapping other ones.

StefanKarpinski commented 10 years ago

@JeffBezanson, so here's what I think is wrong with the type and/or label idea.

Let's say you have code that expects an AbstractArray. Part of the AbstractArray interface is that indexing can throw a bounds error like so:

julia> a = [1,2,3]
3-element Array{Int64,1}:
 1
 2
 3

julia> a[4]
ERROR: BoundsError()
 in anonymous at ./inference.jl:365 (repeats 2 times)

Your write some code that expects this and handles it somehow:

function frizz(a::AbstractArrary)
  try a[4]
  catch err::BoundsError
    # handle a not being long enough
  end
end

If someone calls frizz(::Array) assuming no other frizz methods, and a BoundsError is thrown bygetindex(::Array, i::Int)` it should clearly be caught here.

Now, let's say someone else comes along and implements a new kind of AbstractArray, XArray and as is quite common, they use and existing kind of AbtsractArray to do it – let's say they use an Array instance. Now further suppose that there's some kind of situation where the inner Array type can cause a BoundsError which doesn't actually correspond to BoundsError for XArray – it's a programming error, or just indicates some other kind of problem with the usage of XArray. This is an entirely plausible situation that actually happens quite a bit in the sparse array implementation.

The key observation is that you can have two situations with the exact same throw and catch sites: one where error should be caught and the other where the error shouldn't be caught. The type and/or label idea can't solve this problem since it still only depends on the throw and catch sites – one error is caught if and only if the other one is. To actually solve the problem, whether an error is caught must depend, not only on the throw and catch sites, but also on the stack between them. This is precisely what the chain of custody idea does: it makes catching an error depend on the stack between the thrower and the catcher. The exact details of how to express that are up for debate, but I think this argument makes it clear that we either decide we don't care about this problem, or we need a solution where catching an error depends on the stack that comes between thrower and the catcher.

JeffBezanson commented 10 years ago

Ok, well I'm not running right out to implement the label idea :)

The chain of custody is a really clever idea, but I feel it has two fatal issues: (1) It's not clear that it can be implemented efficiently. A naive implementation would probably slow down the whole language. (2) I believe it will strike most people as un-julian. Even though it is better than java's mechanism and not quite as verbose, I suspect most people will still find it fussy.

StefanKarpinski commented 10 years ago

That's why I left "decide we don't care about this problem" as one of our possible courses of action.

StefanKarpinski commented 10 years ago

Keep in mind that this is completely backwards compatible: no existing code would behave any differently with this proposal. So it's hard to argue that it's somehow more annoying that what we have currently. It is possible to argue that it's more annoying than having typed catch clauses would be, but we've been getting by without those, so that's also hard to see as really a huge problem. It's possible that more of the chain could be implicit, but I rather suspect that these chains will typically be very short – usually just one or two calls deep. Deeper errors are certainly possible, but they are usually of the "catch anything and try to recover" variety, not the "catch a very specific, expected error in this specific method call" variety – which by its very nature would tend not to be a very deeply nested error.

JeffBezanson commented 10 years ago

Yes, the annoyance involved is subtle, which is one of the clever parts of this. I'm more worried about the non-locality that results when somebody wants to catch an exception by type. For it to work, potentially many call sites in the system need to be annotated. When writing a function call in a library, it is hard to guess whether somebody will want to typed-catch an exception from it.

It's fine with me to annotate the call inside a try block that I expect might fail, because that's still local. Maybe one issue is that try blocks just tend to be too big --- there might be only one call in there you want to catch from, but writing a try block around a single call is awkward. Maybe extra syntax like throws just to narrow the effective range of a try block would be good. That would be kind of a lexically-scoped version of your proposal.

StefanKarpinski commented 10 years ago

It's fine with me to annotate the call inside a try block that I expect might fail, because that's still local. Maybe one issue is that try blocks just tend to be too big --- there might be only one call in there you want to catch from, but writing a try block around a single call is awkward. Maybe extra syntax like throws just to narrow the effective range of a try block would be good. That would be kind of a lexically-scoped version of your proposal.

I think this only addresses the most superficial part of the problem, so isn't very satisfying.

tknopp commented 10 years ago

Sorry but I stay at my position. When writing

catch err::BoundsError

I want to catch every occurrence of a BoundsError. I don't want to reason about the internal code and that nobody has forgotten the throws BoundsError declaration. In my opinion exceptions are complicated enough so that it is a big plus if the implementation is simple. And until now Julia has gotten this very right. I just miss to catch specific exceptions as its available in most other languages that have this.

P.S.: Greatest example of a "problematic" exception system is C++ where it is possible to throw integers around and one has no common base class

ssagaert commented 10 years ago

I don't like this proposal for a throws at the call site. I find it complex and I don't really understand it after one read & what problem you are really trying to solve. If I don't understand it then other people also will not understand it and hence it will surely lead to bugs in people's code. I believe that programming language rules should be as simple as possible (without being simplistic in the sense that you end up with a very non-powerful language) so that everybody understands them and hence leads to few programming errors. Developing large software systems is complex enough without adding too much complexity ourselves.

Also for me exceptions are what the name says: exceptional events and hence I used them sparingly, e.g. when a network connections drops. For mundane things like checking preconditions of function arguments I do not use exceptions but I do these checks in "regular" code via if/else /assertions/preconditions.

Also as Stefan mentions himself throws annotation (& the checked exceptions that go with it) in method definitions in Java have not been a success and I am on of those people who actually believe they are a bad idea.

Like mentioned in this https://groups.google.com/forum/?fromgroups=#!topic/julia-dev/wi9YN-pcDR4%5B1-25-false%5D I do however like the idea of type annotation in the catch clauses to narrow a catch to a specific type. This has the advantage of being a minimalistic extension of the current system & it will be immediately familiar to anyone who has programmed in any mainstream (& also some non-mainstream) OO language like C++, Java, C# since it's the same.

it would be as follows:

try catch e1::



catch e2::

end

other exception types are thrown upwards

If you want to catch all exceptions:

try

catch e1::

catch e2::

catch e   (or equivalently: catch e::Exception)


            
            
                StefanKarpinski
                commented
                 10 years ago            
            
                Not understanding a proposal or the problem that it solves after a cursory read-through is not a particularly compelling counterargument. Exceptions themselves were a crazy new idea at some point.
            
        
            
            
                ssagaert
                commented
                 10 years ago            
            
                On 16 Aug 2014, at 19:57, Stefan Karpinski notifications@github.com wrote:

Not understanding a proposal or the problem that it solves after a cursory read-through is not a particularly compelling counterargument. Exceptions themselves were a crazy new idea at some point.
—
Reply to this email directly or view it on GitHub.
Well true of course but when I first learned about exceptions in C++  a very long time ago (1996) I grasped the concept immediately and thought this was way better that error codes like in C so they weren’t that crazy ;) . From what I understood is that your are trying to distinguish between “exceptional” exceptions and “unexceptional” exceptions and I don’t know if that is a good idea.
            
        
            
            
                JeffBezanson
                commented
                 10 years ago            
            
                I guess my issue with it comes down to it being, in effect, dynamically scoped. This makes it inefficient, except perhaps with truly heroic efforts, and harder to reason about. Imagining a fully-realized implementation of this, pick a random f(x) throws y, and I would claim you cannot figure out why the throws is there based on local reasoning. One layer's exception-to-be-caught is another layer's unexpected-error. In fact that is sort of the whole point of exceptions.
            
        
            
            
                ssagaert
                commented
                 10 years ago            
            
                I would support the idea of adding "throws ExceptionType1" to a function definition for documentation purposes. This could also be used by a future API doc generating tool (like Javadoc does). However without the implications of checked exceptions like in Java. So when that exception is not caught at the enclosing calling scope, there is no need to put the same "throws ExceptionType1" in every function up the call stack.
            
        
            
            
                StefanKarpinski
                commented
                 10 years ago            
            
                
I guess my issue with it comes down to it being, in effect, dynamically scoped. This makes it inefficient, except perhaps with truly heroic efforts, and harder to reason about. Imagining a fully-realized implementation of this, pick a random f(x) throws y, and I would claim you cannot figure out why the throws is there based on local reasoning. One layer's exception-to-be-caught is another layer's unexpected-error. In fact that is sort of the whole point of exceptions.

Since I don't yet have an efficient implementation strategy, that's a very valid concern, but I think it's premature to nix an otherwise good idea because we haven't figured out how to do it fast. Of course, I wouldn't add it to the language without already knowing how to do it efficiently.
As to when f(x) throws y should be annotated, it's local reasoning in the exact same way that the type signature of the function is local: a function body should have a throws y annotation at points where the function throws an exception that is part of its expected behavior – i.e. part of its type signature. Throwing a BoundsError is a reasonable part of the signature of a getindex methods, whereas through a DivideError is not, even though a division error may well occur in the course of working with indices.
            
        
            
            
                elextr
                commented
                 10 years ago            
            
                I would also add a note to the efficiency concerns of @JeffBezanson that C++11 changed exception specifications partly due to the implementation cost, the reasoning behind that could be worth closer inspection if someone has a copy of the standards committee rationale docs.
If I understand @StefanKarpinski correctly you would still be able to do the universal catch of all exceptions and then test via ifs for those you can handle and re-raise the others.  This will make the cost visible, and only to those who need it.
            
        
            
            
                tknopp
                commented
                 10 years ago            
            
                I don't think that implementation cost is the issue here. Its:

The non-locality. If you see catch e::BazError you cannot be sure that BazError is really catched.
The expectation. The majority of users with a C++/Java/C# background will we confused.
People will use the isa and rethrow way if they want the standard use of exception. So code that could look much more clear is written in the "workaround" version.

For me this is kind of the switch clause in C++ where my initial expectation is not a default fallthrough. One finds workaround patterns for it (put a break everywhere) but I still think that it was not clever to design it that way.
            
        
            
            
                StefanKarpinski
                commented
                 10 years ago            
            
                These are valid concerns, @tknopp. My only counterargument is that in the cases where catch e::BazError wouldn't catch a BazError, it shouldn't have been caught anyway – and even if that's a bit confusing to a C++/Java programmer, it makes their program (unwittingly) more correct.
            
        
            
            
                tknopp
                commented
                 10 years ago            
            
                hehe. "Dear C++/Java programmer, we are more clever than you. You wrote ... but we think it would be better if you had written ..." :-) Just kidding.
            
        
            
            
                elextr
                commented
                 10 years ago            
            
                To me the non-locality is really an inversion of control, the lower level functions are claiming to know what the upper level functions should be allowed to handle.  But it is repeatedly shown that library (lower level code) developers often don't imagine even a part of the places where their code gets used.
IIUC the original concern of @StefanKarpinski was that you don't know the meaning of an exception thrown from some lower level library is the same as that exception thrown from a higher level function, so its not safe to handle it the same.  Is this not simply that overly general exceptions are being re-used, rather than more specific ones being created by the upper level functions, allowing handlers to distinguish the exception they want to handle?
The expectation answer is more like "Dear C++ programmer (me) this is Julia, not C++ ..." :-)
            
        
            
            
                ssagaert
                commented
                 10 years ago            
            
                +1
On 17 Aug 2014, at 08:17, Tobias Knopp notifications@github.com wrote:

I don't think that implementation cost is the issue here. Its:
The non-locality. If you see catch e::BazError you cannot be sure that BazError is really catched.
The expectation. The majority of users with a C++/Java/C# background will we confused.
People will use the isa and rethrow way if they want the standard use of exception. So code that could look much more clear is written in the "workaround" version.
For me this is kind of the switch clause in C++ where my initial expectation is not a default fallthrough. One finds workaround patterns for it (put a break everywhere) but I still think that it was not clever to design it that way.
—
Reply to this email directly or view it on GitHub.
            
        
            
            
                samoconnor
                commented
                 10 years ago            
            
                "... an exception thrown from some lower level library is the same as that exception thrown from a higher level function, so its not safe to handle it the same. Is this not simply that overly general exceptions are being re-used, rather than more specific ones being created by the upper level functions?"
Yes.
This is just bad API design. If you're calling a function from a library (or elsewhere) the documentation for that function should tell you exactly what results (including side-effects and exceptions) the function produces. If one of the possible results is a FooError, the documentation should tell you what that means in the context of the function. What happens inside the function, and what other functions it calls, is implementation detail and should be of no concern to the caller. Function == encapsulation.
If function a() and function b() can both throw FooError, and function a() calls function b(), then the author of function a() needs to do one of:

catch FooError thrown by function b() and deal with it
ensure that there is no semantic difference between directly throwing FooError and indirectly throwing b()'s FooError.

If the author of function a() calls function b() without bothering to handle all possible return states, then function a() may be ok as interesting exploratory code, but it has no place in a library. This is like not checking sys call return codes in a C program. At the very least function a() should assert that function b() has some expected return state.
try x = b(y) catch e @assert false string(e) end
            
        
            
            
                nalimilan
                commented
                 10 years ago            
            
                I find @StefanKarpinski's proposal quite interesting provided it could be implemented efficiently. It makes sense that by default, people trying to catch BazError won't catch it if it comes from a lower function in the stack, from where they probably don't expect the exception to come.
But maybe there should also be a simple syntax allowing to catch BazError disregarding where it comes from? It would not be the most natural syntax, so that people don't use it without enough reflection, but it wouldn't be completely ugly either, like catchall e::BazError. It could be used in precise cases, e.g. to work around bugs in the code you call when the author forgot to add the throws BazError annotation, or do perform some debugging. If it turns out it's not really useful, then it could easily be deprecated.
            
        
            
            
                wavexx
                commented
                 9 years ago            
            
                I was about to file a report about the 'catch' statements when I stumbled on
this (julep label?).
I find @StefanKarpinski proposal interesting, but not completely satisfying.
First, I think the main problem with exceptions is that exceptions are often
not only used as pure error conditions, but also as generic non-local returns.
This is when there is a debate whether all exceptions should be handled
(because they are supposed to be errors to handle correctly!), or should be
left free to the programmer to handle (because it's just a different control
flow mechanism).
From experience, there are many situations that benefit from generic
non-local returns. It turns out that most languages implement non-local returns
using exceptions (as in "error exceptions") instead of doing the opposite:
implement errors on top of a generic non-local return, which would remove
the ambiguity (errors would then be always errors).
I think that I don't need to discuss the advantages/disadvantages or non-local
returns here. Coming from a CL background, there are situations which benefit
from them, and others that do the opposite. This largely depends on the
situation at hand. The only thing that I want to point out is that a language
without non-local returns ends up be very verbose in several situations, and
error checking is the most common (I'm looking at you, Go). Repetition also
makes code just as error-prone.
Back to the proposal: having to declare each usage as throwing the exception
would often be very verbose:
try
    func throws Condition
    func throws Condition
    func throws Condition
catch ::Condition
    ... 
end
This is definitely worse than the current mechanism:
try
    func
    func
    func
catch e
    isa(e, Condition) || rethrow(e)
    ...
end
This compounds if you consider that forgetting a 'throws' declaration will
cause your code to silently work anyway, but fail upon receiving an exception.
The failure will be downward in the stack. Trickier to debug than a regular
catch mechanism.
You say this proposal it's all about being explicit (just to be clear: I'm
totally on-board with this assumption), so if you want this to actually help
the developer, some compilation warnings are required. If you declare a
function with 'throws' once, and use the same function again in the same block,
a warning would be warranted.
This leads to the second issue, which is about the left-hand side of the
'throws' statement. What about closures? What happens with macros?
Note that I also have concerns with the current situation of 'catch' in the
current way of doing things. It feels like that the default is to catch any
error in the block, but this is hardly the case. In fact, in my experience,
the exact opposite is true: you often want to catch exactly one specific
error, because you know beforehand that's the case you can handle. It's
currently too easy to either trap any error or to forget a rethrow. Both
situations are just as bad. The common catch x::Type idiom is a good
compromise, which actually just avoid this kind of mistakes.
            
        
            
            
                tknopp
                commented
                 9 years ago            
            
                @wavexx: I fully agree
            
        
            
            
                samoconnor
                commented
                 8 years ago            
            
                FWIW I've cleaned up my @retry if... and @ignore if... exception handling macros, written some test cases and published them here https://github.com/samoconnor/Retry.jl.
The @ignore if... macro attempts to address @StefanKarpinski's initial concern about the "possibility of catching the wrong error" by making it easier to write exception filtering statements.
            
        
            
            
                wavexx
                commented
                 8 years ago            
            
                On 24/12/15 16:29, samoconnor wrote:

FWIW I've cleaned up my |@retry if...| and |@ignore if...| exception
handling macros, written some test cases and published them here
https://github.com/samoconnor/Retry.jl.
The |@ignore if...| macro attempts to address @StefanKarpinski
https://github.com/StefanKarpinski's initial concern about the
"possibility of catching the wrong error" by making it easier to write
exception filtering statements.

[side note: I've seen an email notification from Mason Bially but cannot
find it here, and ended up forgetting about it].
The problem that I have with these macros is that they're not based on
types. I'd like the exception mechanism based on type dispatch foremost,
just like regular function calls, to make it more uniform.
If catch could have arguments with type declarations I would already be
partially satisfied. The default behavior should be rethrow.
I also think proper exception handling is a bit [too much] overdue.
The amount of incorrect catch blocks I've seen in julia modules is
downright frightening, and reminds me too much of the situation there's
already in R, where the error handling has a similar interface.
People throw values and catch anything, conflating a hard I/O error with
a non-local exits. The result is that hard I/O errors go unnoticed, and
the system returns faulty values instead of failing.
A quick fix for this might just have two base types for throw to emit:
the first for a non-local exit (let's call it "exit"), the second for a
runtime error ("error").
If the type throwed is based on the 'exit' type, and is not catched on
the current frame, emit a warning. The intention of 'exit' is to have
non-local returns at hand, that we'd like to handle them locally as well
without performing nested if/else blocks. Clear intention, with
compiler-assisted warnings. This would allow to write simple macros that
help with retry/delay behavior with meaningful warning messages for the
user. Similarly, if there's no throw statement of type 'exit' in the
local frame, but catch ::exit is used, another warning is warranted (we
probably do not want to catch exits from another frame).
In my vision, a lone catch without arguments (a catch everything block)
should emit a warning that should be silenced explicitly (like an
"uninitialized variable warning in C" so to speak). If there are already
catch declarations with types instead, we assume the programmer is
actually handling the default behavior by itself.
            
        
            
            
                mason-bially
                commented
                 8 years ago            
            
                
[side note: I've seen an email notification from Mason Bially but cannot find it here, and ended up forgetting about it].

You are probably thinking of this.
For which I would submit as my thoughts on the subject, in addition to repeating yours:

I also think proper exception handling is a bit [too much] overdue. 
            
        
    
    
                    Next
            

    
        
            ©  Githubissues.
            Githubissues is a development platform for aggregating issues.