Automatic scaling while loading single channel data

EtienneCmb / visbrain

A multi-purpose GPU-accelerated open-source suite for brain data visualization

http://visbrain.org

Other

242 stars 65 forks source link

Automatic scaling while loading single channel data #29

Closed skjerns closed 5 years ago

skjerns commented 5 years ago

I see that there is automatic scaling applied via io.read_sleep.py

        if np.abs(np.ptp(data, 0).mean()) < 0.1:
            warn("Wrong data amplitude for Sleep software.")
            data *= 1e6

In this way, we look at the Peak-to-Peak values across channels. This has two problems:

If there is only one channel, this method will rescale, regardless of the dimensions.
If the channels have different dimensions (one EOG in mV and one EEG in uV) I assume that this function will also not rescale, even if it would be necessary (didn't test this).

Question: Would it not make more sense to look at the ptp across one channel, or do you expect drift? Or find a different way of detecting wrong scaling (just simple min/max should suffice) ?

I can implement a check for single-channel data if you want.

EtienneCmb commented 5 years ago

GitMate.io thinks a possibly related issue is https://github.com/EtienneCmb/visbrain/issues/7 (Add .rec loading capability).

EtienneCmb commented 5 years ago

@raphaelvallat I think this a part of your code, can you take a look at it?

raphaelvallat commented 5 years ago

Hi @skjerns Yes, I agree that the current implementation is not adequate for the two situations that you describe. Plus, I also think that the ptp might sometimes be biased in cases of very strong artefacts. As a more robust check, we could use the interquartile range (scipy.stats.iqr) and check for each individual channel:

# Assume that the inter-quartile amplitude of EEG data is ~50 uV
from scipy.stats import iqr
iqr_data = iqr(data, axis=1)

for idx_chan, iqr_chan in enumerate(iqr_data):
    if iqr_chan < 1:
        mult_fact = np.floor(np.log10(50 / iqr_chan))
        warn("Wrong channel data amplitude. Multiplying data amplitude by 10^%i" % mult_fact)
        data[idx_chan, :] *= 10 ** mult_fact

If you have any ideas for a better check, please feel free to create a PR! Thanks

skjerns commented 5 years ago

I think this is a good solution for now, and should cover most cases.

I suspect that there might be some problems however in case of non-EEG data (like Heartrate or temperature), but don't have a sample file right now to check it.

PR is created #33