MIT-LCP / wfdb-python

Native Python WFDB package
MIT License
730 stars 298 forks source link

Handle all-NaN channels in calc_adc_params #485

Open bemoody opened 2 months ago

bemoody commented 2 months ago

If all samples in a channel are NaN, calc_adc_params will fail:

>>> wfdb.wrsamp("xxx", fs=500, units=["mV"], sig_name=["I"], p_signal=numpy.array([[numpy.nan]]), fmt=["16"])
/home/bmoody/work/wfdb-python/wfdb/io/_signal.py:740: RuntimeWarning: All-NaN slice encountered
  minvals = np.nanmin(self.p_signal, axis=0)
/home/bmoody/work/wfdb-python/wfdb/io/_signal.py:741: RuntimeWarning: All-NaN slice encountered
  maxvals = np.nanmax(self.p_signal, axis=0)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/bmoody/work/wfdb-python/wfdb/io/record.py", line 2943, in wrsamp
    record.set_d_features(do_adc=1)
  File "/home/bmoody/work/wfdb-python/wfdb/io/_signal.py", line 470, in set_d_features
    self.adc_gain, self.baseline = self.calc_adc_params()
  File "/home/bmoody/work/wfdb-python/wfdb/io/_signal.py", line 787, in calc_adc_params
    baseline = int(np.floor(baseline))
ValueError: cannot convert float NaN to integer

A couple things are wrong here:

  1. if pmin == np.nan doesn't do what you think.

  2. nanmin and nanmax will give a RuntimeWarning if all samples in a channel are NaN.

(1) is easy to fix. (2) is a little weirder; have a look at the code of nanmin:

    if type(a) is np.ndarray and a.dtype != np.object_:
        # Fast, but not safe for subclasses of ndarray, or object arrays,
        # which do not implement isnan (gh-9009), or fmin correctly (gh-8975)
        res = np.fmin.reduce(a, axis=axis, out=out, **kwargs)
        if np.isnan(res).any():
            warnings.warn("All-NaN slice encountered", RuntimeWarning,
                          stacklevel=3)

In other words, for ordinary numeric numpy arrays, np.fmin.reduce gives what we want (minimum non-NaN value if there is one, otherwise NaN, and no warning.) It might not work if the array is something more exotic (e.g. a numpy-compatible array class created by some other python package.)

I think I understand the comment about object arrays (https://github.com/numpy/numpy/issues/8975, https://github.com/numpy/numpy/issues/9009), but I don't understand the "subclasses of ndarray" comment. When I try creating a trivial subclass of ndarray, fmin still appears to work as expected. So I don't see why the strict is np.ndarray is needed.

bemoody commented 1 month ago

I don't understand the "subclasses of ndarray" comment

The commit message gives slightly more explanation

commit 9c7e6e30315ab4541775d9a78630e88423c752c1
Author: Charles Harris <charlesr.harris@gmail.com>
Date:   Fri Oct 4 13:49:33 2013 -0600

    BUG: Refactor nanfunctions to behave as agreed on for 1.9.

        Deal with subclasses of ndarray, like pandas.Series and matrix.

            Subclasses may not define the new keyword keepdims or deal
            gracefully with ufuncs in all their forms. This is solved by
            throwing the problem onto the np.sum, np.any, etc. functions
            that have ugly hacks to deal with the problem.

        Settle handling of all-nan slices.

            nanmax, nanmin -- Raise warning, return NaN for slice.
            nanargmax, nanargmin -- Raise ValueError.
            nansum -- Return 0 for slice.
            nanmean, nanvar, nanstd -- Raise warning, return NaN for slice.

        Make NaN functions work for scalar arguments.

            This may seem silly, but it removes a check for special cases.

        Update tests

            Deal with new all-nan handling.
            Test with matrix class as example of subclass without keepdims.
            Test with scalar arguments.

        Fix nanvar issue reported in #3860.

        Closes #3860 #3850

Indeed, I see that pandas.Series chokes with keepdims=True. But we're not doing that so I think we should be fine with np.fmin.reduce(self.p_signal, axis=0).

If we care about object arrays, we could use np.nanargmin and np.nanargmax instead (but those are less efficient for numeric arrays.)