Closed groutr closed 2 years ago
@groutr just a quick glance through the test logs:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
mapping = true_positive 1
false_positive 2
false_negative 3
true_negative 4
dtype: object
converter = <class 'numpy.float64'>
def convert_mapping_values(
mapping: MutableMapping[str, npt.DTypeLike],
converter: np.dtype = np.float64
) -> MutableMapping:
"""Convert mapping values to a consistent type. Primarily used to validate
contingency tables.
Parameters
----------
mapping: dict-like, required
Input mapping with string keys and values that can be coerced into a
numpy data type.
converter: numpy.dtype, optional, default numpy.float64
Converter data type or function used to convert mapping values to a
consistent type.
Returns
-------
converted_mapping: dict-like, same type as mapping
New mapping with converted values.
"""
# Build object of same type of mapping
d = type(mapping)()
> d.update(zip(mapping.keys(), np.fromiter(mapping.values(), dtype=converter)))
E TypeError: 'numpy.ndarray' object is not callable
/opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages/hydrotools/metrics/metrics.py:262: TypeError
Seems like a similar failure causes a few of the tests to fail. Thoughts?
The issue could be related to assumptions made in the original method. The original (slow) method was meant to work with both Python dict
and pandas.Series
. For dict
values
is a Callable. However, for pandas.Series
values
is a non-callable property.
An additional nuance is that pandas.Series.update
takes an object that must be coercible into a pandas.Series
. I'm unsure of what Series
will do when feeding it zip
directly like this.
Actually come to think of it, it may be easier (and faster) to offload this task to pandas
.
return pd.Series(mapping, dtype=converter)
Actually come to think of it, it may be easier (and faster) to offload this task to
pandas
.return pd.Series(mapping, dtype=converter)
That was my initial thought, however it seemed like there was effort made to make this general to any mapping type. If a pandas series is assumed, this is going to be the best way.
I was adhering to the type information in the function signature. pd.Series is not a Mapping type (though it sort of mimics the python dictionary behavior)
isinstance(pd.Series(), Mapping) # False
Thus far, this method is only used internally to validate contingency tables. @groutr were you calling this method directly? If not, this might be a good argument for replacing this method with something more strict, but hopefully faster (that only returns pandas.Series
).
@jarq6c I am not calling the method directly. Replacing it with something more strict seems like a good idea to me.
Superseded by #191
Changes
convert_mapping_values
work more efficiently (for a dict of length 1000, runtime drops from 471us to 294us on my machine)Notes
Since the numpy dtypes are C functions, pushing the loop into C (by using map) will be far more efficient.
There was a type mismatch between the type of mapping and the return of the function. Updated the type of mapping to be MutableMapping.