cbrnr / sleepecg

Sleep stage detection using ECG
BSD 3-Clause "New" or "Revised" License
90 stars 23 forks source link

[JOSS Review] HRV Units #173

Closed peterakirk closed 1 year ago

peterakirk commented 1 year ago

Hi all, just a comment relating to the JOSS review (https://github.com/openjournals/joss-reviews/issues/5411#issuecomment-1536077814).

I've been testing out your package on on an open dataset (fs=100, https://doi.org/10.13026/sq6q-zg04), just to make sure HRV values look as they should, using the following code (ignoring sleep classification).

# sleepecg extraction
sleepecg_beats = sleepecg.detect_heartbeats(sub_ecg.values, fs)
record = sleepecg.SleepRecord(heartbeat_times=sleepecg_beats)
features, stages, feature_ids = sleepecg.extract_features([record], lookback=240, lookforward=60,
                                                          feature_selection=["hrv-time"])
sleepecg_analyzed = features[0][:, 5]*10
sleepecg_index = (sleepecg_analyzed < 100)
sleepecg_rmssd = np.nanmean(sleepecg_analyzed[sleepecg_index])

HRV features (i.e., RMSSD) appear to be correlate highly with our (RapidHRV) metrics. However, they appear 1 magnitude smaller in units. For instance, SleepECG suggests subject 80 has an RMSSD of 1.82 which would be highly unusual, where as 18.2 is much more typical and is more in line with what I would expect (RapidHRV puts it close to this too, at 17.83).

When I see RMSSD values, I typically expect them to be in ms units (in the 10-100 range). This doesn’t seem necessarily vital for classification etc., but would be important for interpretation. If I have understood this correctly—that the returned units are not in ms—could you clarify this in your documentation (or change output accordingly, if possible)?

cbrnr commented 1 year ago

Yes, we use seconds throughout all computations. You are correct that scaling is not important for classification, and I understand it would be nice if features were comparable to prior work. Would it be OK to mention that we use seconds in the docs for now? Switching units could affect many places, and this has to be tested thoroughly.

I don't quite understand your steps – why do you multiply by 10 and then average only those features less than 100?

In principle, it might be possible to rescale individual features after they have been calculated, right? Maybe this would be the easiest option.

peterakirk commented 1 year ago

Yes, absolutely ok to just include in docs. I just think the information needs to be there somewhere, as I was a little confused at first. Likewise, it might be useful to also clarify in your docs that sleepecg.ECGRecord fs argument is in ms (not hz).

I don't quite understand your steps – why do you multiply by 10 and then average only those features less than 100?

Constraining averaging to windows with an RMSSD of < 100 was just a quick, crude cleaning procedure I used on the data based on a conservative biological constraint, as that is what I applied to benchmark against RapidHRV data too. The expansion by 10 was simply because all the RMSSD values were 10x lower than RMSSD values in ms. However, presumably this is not what you would expect converting RMSSD in seconds to ms? Also, would RMSSD values in seconds be in the range 1-5 like was produced in my analyses? This seems high.

Is this related to the input data sample rate (i.e., fs=100)? I had a quick scan and did not notice resampling included as part of __hrv_timedomainfeatures (whereas there is for __hrv_frequencydomainfeatures).

cbrnr commented 1 year ago

Yes, absolutely ok to just include in docs. I just think the information needs to be there somewhere, as I was a little confused at first.

OK, I'll update the docs!

Likewise, it might be useful to also clarify in your docs that sleepecg.ECGRecord fs argument is in ms (not hz).

The sampling frequency should be specified in Hz, not ms. But you are right, this information is not included in the docstring (it just says "The sampling frequency."). I'll add "(in Hz)". Or did you have a specific reason why you thought it was really ms?

Constraining averaging to windows with an RMSSD of < 100 was just a quick, crude cleaning procedure I used on the data based on a conservative biological constraint, as that is what I applied to benchmark against RapidHRV data too. The expansion by 10 was simply because all the RMSSD values were 10x lower than RMSSD values in ms. However, presumably this is not what you would expect converting RMSSD in seconds to ms? Also, would RMSSD values in seconds be in the range 1-5 like was produced in my analyses? This seems high.

I'm not sure, but since 1s = 1000ms, I'd expect a difference of a factor of 1000, not 10.

Is this related to the input data sample rate (i.e., fs=100)? I had a quick scan and did not notice resampling included as part of _hrv_timedomain_features (whereas there is for _hrv_frequencydomain_features).

No, this should not be related to the sampling frequency.

cbrnr commented 1 year ago

@hofaflo do you have time to take a look?

hofaflo commented 1 year ago

detect_heartbeats returns an array of indices, so in the code snippet above, before passing sleepecg_beats as heartbeat_times to SleepRecord, a division by the sampling rate is required to get a factor of 1000.

cbrnr commented 1 year ago

Ah, of course, now it makes sense!

peterakirk commented 1 year ago

Great, thanks for clarifying, all looks good to me!