NOAA-OWP / hydrotools

Suite of tools for retrieving USGS NWIS observations and evaluating National Water Model (NWM) data.
Other
54 stars 13 forks source link

Make map conversions more efficient. #188

Closed groutr closed 2 years ago

groutr commented 2 years ago

Changes

Notes

Since the numpy dtypes are C functions, pushing the loop into C (by using map) will be far more efficient.

There was a type mismatch between the type of mapping and the return of the function. Updated the type of mapping to be MutableMapping.

hellkite500 commented 2 years ago

@groutr just a quick glance through the test logs:

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

mapping = true_positive     1
false_positive    2
false_negative    3
true_negative     4
dtype: object
converter = <class 'numpy.float64'>

    def convert_mapping_values(
        mapping: MutableMapping[str, npt.DTypeLike],
        converter: np.dtype = np.float64
        ) -> MutableMapping:
        """Convert mapping values to a consistent type. Primarily used to validate
        contingency tables.

        Parameters
        ----------
        mapping: dict-like, required
            Input mapping with string keys and values that can be coerced into a
            numpy data type.
        converter: numpy.dtype, optional, default numpy.float64
            Converter data type or function used to convert mapping values to a
            consistent type.

        Returns
        -------
        converted_mapping: dict-like, same type as mapping
            New mapping with converted values.

        """
        # Build object of same type of mapping
        d = type(mapping)()
>       d.update(zip(mapping.keys(), np.fromiter(mapping.values(), dtype=converter)))
E       TypeError: 'numpy.ndarray' object is not callable

/opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages/hydrotools/metrics/metrics.py:262: TypeError

Seems like a similar failure causes a few of the tests to fail. Thoughts?

jarq6c commented 2 years ago

The issue could be related to assumptions made in the original method. The original (slow) method was meant to work with both Python dict and pandas.Series. For dict values is a Callable. However, for pandas.Series values is a non-callable property.

An additional nuance is that pandas.Series.update takes an object that must be coercible into a pandas.Series. I'm unsure of what Series will do when feeding it zip directly like this.

jarq6c commented 2 years ago

Actually come to think of it, it may be easier (and faster) to offload this task to pandas.

return pd.Series(mapping, dtype=converter)
groutr commented 2 years ago

Actually come to think of it, it may be easier (and faster) to offload this task to pandas.

return pd.Series(mapping, dtype=converter)

That was my initial thought, however it seemed like there was effort made to make this general to any mapping type. If a pandas series is assumed, this is going to be the best way.

I was adhering to the type information in the function signature. pd.Series is not a Mapping type (though it sort of mimics the python dictionary behavior)

isinstance(pd.Series(), Mapping)  # False
jarq6c commented 2 years ago

Thus far, this method is only used internally to validate contingency tables. @groutr were you calling this method directly? If not, this might be a good argument for replacing this method with something more strict, but hopefully faster (that only returns pandas.Series).

groutr commented 2 years ago

@jarq6c I am not calling the method directly. Replacing it with something more strict seems like a good idea to me.

jarq6c commented 2 years ago

Superseded by #191