Warnings during feature extraction

cbrnr commented 1 year ago

When running examples/classifiers/wrn_gru_mesa.py, I noticed several warnings during extract_features():

``` sleepecg/sleepecg/feature_extraction.py:363: RuntimeWarning: HR analysis window too short for estimating PSD for feature VLF. 3030.3s required, got 270s warnings.warn(msg, category=RuntimeWarning) sleepecg/sleepecg/feature_extraction.py:225: RuntimeWarning: Mean of empty slice meanNN = np.nanmean(NN, axis=1) sleepecg/sleepecg/feature_extraction.py:226: RuntimeWarning: All-NaN slice encountered maxNN = np.nanmax(NN, axis=1) sleepecg/sleepecg/feature_extraction.py:227: RuntimeWarning: All-NaN slice encountered minNN = np.nanmin(NN, axis=1) sleepecg/.direnv/python-3.10.9/lib/python3.10/site-packages/numpy/lib/nanfunctions.py:1878: RuntimeWarning: Degrees of freedom <= 0 for slice. var = nanvar(a, axis=axis, dtype=dtype, out=out, ddof=ddof, sleepecg/sleepecg/feature_extraction.py:232: RuntimeWarning: Mean of empty slice RMSSD = np.sqrt(np.nanmean(SD**2, axis=1)) sleepecg/.direnv/python-3.10.9/lib/python3.10/site-packages/numpy/lib/nanfunctions.py:1217: RuntimeWarning: All-NaN slice encountered r, k = function_base._ureduce(a, func=_nanmedian, axis=axis, out=out, sleepecg/.direnv/python-3.10.9/lib/python3.10/site-packages/numpy/lib/nanfunctions.py:1583: RuntimeWarning: All-NaN slice encountered result = np.apply_along_axis(_nanquantile_1d, axis, a, q, sleepecg/sleepecg/feature_extraction.py:244: RuntimeWarning: Mean of empty slice cvSD = SDSD / np.nanmean(SD, axis=1) sleepecg/sleepecg/feature_extraction.py:244: RuntimeWarning: divide by zero encountered in divide cvSD = SDSD / np.nanmean(SD, axis=1) sleepecg/.direnv/python-3.10.9/lib/python3.10/site-packages/numpy/lib/nanfunctions.py:1095: RuntimeWarning: All-NaN slice encountered | 12/1970 [00:06<15:03, 2.17it/s] result = np.apply_along_axis(_nanmedian1d, axis, a, overwrite_input) sleepecg/sleepecg/feature_extraction.py:252: RuntimeWarning: invalid value encountered in sqrt | 23/1970 [00:12<17:47, 1.82it/s] SD2 = (2 * SDNN**2 - SD1**2) ** 0.5 sleepecg/sleepecg/feature_extraction.py:256: RuntimeWarning: divide by zero encountered in divide | 32/1970 [00:16<15:35, 2.07it/s] CSI = SD2 / SD1 sleepecg/sleepecg/feature_extraction.py:257: RuntimeWarning: divide by zero encountered in log10 CVI = np.log10(SD1 * SD2 * 16) sleepecg/sleepecg/feature_extraction.py:244: RuntimeWarning: invalid value encountered in divide | 1166/1970 [10:26<07:01, 1.91it/s] cvSD = SDSD / np.nanmean(SD, axis=1) sleepecg/sleepecg/feature_extraction.py:254: RuntimeWarning: invalid value encountered in divide SD1_SD2_ratio = SD1 / SD2 sleepecg/sleepecg/feature_extraction.py:256: RuntimeWarning: invalid value encountered in divide CSI = SD2 / SD1 ```

Are these warnings problematic? Should we take a closer look and try to fix them? Or if they are OK, how do they influence the results (is a feature nan e.g. when a division by zero occurs)?

hofaflo commented 1 year ago

Those warnings occur for analysis windows where there are no heartbeats (or just heartbeats with equal distances, voiding some statistical measures), in which case the resulting feature value is nan. We should be able to avoid them by removing those windows from NN before calculating the features (and just directly set the respective entries in the feature matrix to np.nan). Features with value nan are masked in prepare_data_keras, so this does not influence the classification.

cbrnr commented 1 year ago

Alternatively, we could also catch (and silence) those warnings if there is nothing to warn about. Which solution would you prefer?

hofaflo commented 1 year ago

Avoiding the warnings turned out to be very cumbersome, so I made the changes to ignore them in #142.

cbrnr / sleepecg

Warnings during feature extraction #138