IndEcol / pymrio

Multi-Regional Input-Output Analysis in Python.
http://pymrio.readthedocs.io/en/latest/
Other
154 stars 71 forks source link

Correct handling of divisions by zero in calculating matrix A #98

Closed aleeciu closed 1 year ago

aleeciu commented 1 year ago

The calc_A function in pymrio.tools.iomath currently handles divisions by zero as follows:

with warnings.catch_warnings():
    # catch the divide by zero warning
    # we deal wit that by setting to 0 afterwards
    warnings.simplefilter("ignore")
    recix = 1 / x
recix[recix == np.inf] = 0

however, this led to an error when dealing with, e.g., the 2014 WIOD16 table which has zeros inside. In fact, python does raise a ZeroDivisionError and not a warning when dividing by zero. The way I fixed the issue was to replace the code above with recix = np.divide(1., x, out=np.zeros_like(x), where=x!=0).

calc_A would then become:

def calc_A_(Z, x):
    """Calculate the A matrix (coefficients) from Z and x

    A is a normalized version of the industrial flows Z

    Parameters
    ----------
    Z : pandas.DataFrame or numpy.array
        Symmetric input output table (flows)
    x : pandas.DataFrame or numpy.array
        Industry output column vector

    Returns
    -------
    pandas.DataFrame or numpy.array
        Symmetric input output table (coefficients) A
        The type is determined by the type of Z.
        If DataFrame index/columns as Z

    """
    if (type(x) is pd.DataFrame) or (type(x) is pd.Series):
        x = x.values
    if (type(x) is not np.ndarray) and (x == 0):
        recix = 0
    else:
        recix = np.divide(1., x, out=np.zeros_like(x), where=x!=0)
        recix = recix.reshape((1, -1))
    # use numpy broadcasting - factor ten faster
    # Mathematical form - slow
    # return Z.dot(np.diagflat(recix))
    if type(Z) is pd.DataFrame:
        return pd.DataFrame(Z.values * recix, index=Z.index, columns=Z.columns)
    else:
        return Z * recix
aleeciu commented 1 year ago

I just realized that the above happened because I loaded the data with dtype = object. When these are loaded as floats, the zeros are handled correctly by the original code. I am closing the issue.