Exception-catching in pyomo.core.base.misc.apply_indexed_rule() can mask TypeErrors in user code

The pyomo.core.base.misc.apply_indexed_rule() function includes some code which is wrapped in try/catch clauses to catch TypeError exceptions. If my own rule code accidentally generates a TypeError, this code can potentially mask it. This function includes fallback code that will retry the rule without the wrappers, to re-generate the exception. However, in some of my rules, the first call has side effects which short-circuit execution during the second call, so that the TypeError in my code is silently masked. This made it very hard to debug my code.

I think the best solution would be to streamline the apply_indexed_rule() so that it doesn't wrap my rule in a try/catch clause. However, I don't know which code is supposed to be wrapped by the TypeError catching code, so I can't propose a specific fix.

This is the case which causes problems for me: I often want to initialize indexed parameters by aggregating data from some other part of the model. It is possible to do this piecemeal from the initialization rule. However it is often more efficient to create the aggregated data during a single pass through the underlying data, then cache those results and extract aggregated data as needed to initialize the aggregate parameter. I can do this in a standard initialization rule as follows (1) check whether a cache has been created, (2) create and fill the cache if it hasn't been created, (3) pop the requested result from the cache. The problem is this: If step (2) generates a TypeError, the apply_indexed_rule() function will catch it, then call my function again. However on the second pass, the check in step (1) will succeed, step (2) will be skipped, and I'll never get an error message connected to my bad code.

Here's an example of code that has this problem.

from pyomo.environ import *

m = AbstractModel()

# "big" set and matching parameter
m.S12 = Set(initialize=[(10, 1), (10, 2), (20, 1), (20, 3), (20, 5)])
m.p12 = Param(m.S12, initialize=lambda m, s1, s2: s1 + s2)

# set and parameter aggregated by first component of m.S12
# (i.e., sum of all values in m.p12 with the same s1 index)
m.S1 = Set(rule=lambda m: set(s1 for (s1, s2) in m.S12))

# set m.p1 equal to the sum of all entries in p12 with matching s1 index
def p1_rule_inefficient(m, s):
    # this scans all values in p12 for each value of p1, 
    # which is slow for large models
    return sum(m.p12[s1, s2] for (s1, s2) in m.S12 if s1==s)

def p1_rule_efficient(m, s):
    # this computes the same values as p1_rule_inefficient but only requires one pass
    if not hasattr(m, "cache_dict"):
        # create a dictionary to cache aggregated values
        m.cache_dict = dict()
        # next line will raise TypeError which will incorrectly be ignored
        val = 'text' + 3
        # remaining lines will never run, due to side effects during first call
        for (s1, s2) in m.S12:
            m.cache_dict.setdefault(s1, 0)
            m.cache_dict[s1] += m.p12[s1, s2]

    if s in m.cache_dict:
        # return a cached value
        print "setting p[{}]={}".format(s, m.cache_dict[s])
        return m.cache_dict[s]
    else:
        # this should never happen
        print "Key {} is missing from m.cache_dict.".format(s)
        return None

m.p = Param(m.S1, rule=p1_rule_efficient)

i = m.create_instance()
# cache_dict is created but all keys are missing and no error is reported

This code should report a TypeError on the val = ... line, but instead it falls through to the end, where it reports that the cache is empty. This is fairly tough to debug, since the offending line is never mentioned in a traceback. (I have to do a bisection search through my code, inserting trace points until I find the first line that is never executed.)

As I mentioned, I think that tidying up the try/catch code in pyomo.core.base.misc.apply_indexed_rule() is the best way to avoid this problem. Barring that, users should be warned that their rules may be called twice if they contain an error, so the rules should not have any side effects if they fail due to an error (e.g., the cache_dict should be kept as a local variable until it is completely created, then tacked onto the model object).

After looking a little more deeply, I realized that (I think) the TypeError code is meant to support the case where a rule doesn't accept the index as an argument, and instead returns a dict-like object. The Param code uses this to support exactly the type of single-pass scanning that I described above (just return the dictionary instead of caching it and subsequently returning elements). However, the current approach has a few problems:

If a rule for a Param doesn't accept an index, but raises a TypeError before it can return the initialization dict, the user ends up getting a TypeError that their rule is being passed an index argument that it doesn't accept. No TypeError is raised at the location of the real error in their code.
A rule can accept an index but then return a dict-like object on the first call, which will prevent any further calls to the rule. This is strange and shouldn't really be allowed. i.e., if a rule will return a dict-like object, it shouldn't accept an index and vice-versa. On the other hand, at present, if the rule code contains a TypeError, the only way for the user to see that error is to accept an (unused) index argument.
Returning a dict-like object to initialize a Param is a nice option, which should be extended to other indexed objects (e.g., set arrays). Since it is only possible to use this behavior for Params, I end up forgetting where I can use it, and I just rely on code like I've written above in all cases. It would be nice to allow (and use) this behavior as widely as possible.

So my recommendation would be something like this:

apply_indexed_rule should use rule.__code__.co_argcount to determine in advance how many arguments the rule requires (similar to pyomo.core.base.sets). If the rule needs an index, apply_indexed_rule should provide it. If the rule doesn't need an index, apply_indexed_rule should not provide it. (Alternatively, apply_indexed_rule should closely analyze any TypeErrors it receives, to make sure they really were the result of passing an unwanted index in this particular line of code; if so, try again without the index; if not, raise the exception.)
Optional: apply_indexed_rule should make sure that the return value from the rule is dict-like if an index was available but not accepted by the rule. In other cases, apply_indexed_rule should make sure that the return value from the rule is not dict-like. Ideally, the 'dict-like' status should then be conveyed to the caller in some way, but that may be too much to ask.
In addition to Param, other components that can be initialized with a dictionary should also accept a dictionary returned by the initialization rule. (This should at least be available for set arrays (IndexedSet), which I often construct by scanning other sets or parameters.)

These changes would prevent masking errors in users' rules or running the rules twice, and would also streamline creation of components that need to be aggregated from other components at run time. (In general, returning a dictionary with all the data is much faster than scanning the underlying data once for each index term.)

I could put together a pull request that would implement these suggestions, if that would be helpful.

Pyomo / pyomo

Exception-catching in pyomo.core.base.misc.apply_indexed_rule() can mask TypeErrors in user code #184