Stanford-STAGES / stanford-stages

Automated sleep staging scoring and narcolepsy identification
76 stars 26 forks source link

Algorithm rarely detects REM (in our data) #19

Closed Mensen closed 4 years ago

Mensen commented 4 years ago

We have been exploring different options for automatic scoring since we have a high-density EEG dataset consisting of over 300 participants with all-night sleep data.

We manually scored a subset of 15 files, so we have a pretty good idea of the "true" scoring.

We have continually found that the algorithm almost never scores REM:

image

Of course, we aren't sure what the issue but do suspect that our EMG channels are not giving us the best signals... and the REM determination in your algorithm heavily depends on this channel (would make sense to some extent).

We were very curious as to whether you had seen this issue in your own work and had any information on what we might be able to do to address this issue. Most of the other epochs are being scored to a comparable level that we'd be comfortable with (after a brief manual check).

Any thoughts on this issue would be greatly appreciated!

informaton commented 4 years ago

This is peculiar, and you may be right in that the cause could be due to a poor EMG channel. The times I have heard of scoring issues (e.g. all wake or perhaps no REM) turned out to be related to some other channel being used for the central or occipital lead, or it not being referenced first (e.g. to a mastoid). We have seen issues with the first and last 2.5 minutes of a study sometimes being scored as REM simply because the lights off/on were not accounted for. Are you entering two channel possibilities for the centrals (C3 and C4) and occipital leads (O1 and O2)?

Mensen commented 4 years ago

We've done a little experimenting with different Central channels, but ultimately just use the C3/C4 equivalents (from our 256-channel net)... as well as the occiptal leads.

They are referenced to the opposite mastoid as they should be.

The EMG used comes from chin recordings, however both leads were placed under the chin (mandible), rather than one under and one on the chin bone itself. Since we have the larger nets, we've also tried several deviations from the electrodes on the cheek, in combination with the classic EMG electrodes, but no other deviations systematically improves results.

We've looked at the results for 15 different recordings and the complete lack of REM is always there.

As far as I understood, the algorithm ultimately combines the findings from 16 different models. Is there any way apart from mass testing, to know whether there are certain models that are less dependent on the EMG channel as inputs? Are the model weightings to input provided anywhere?

informaton commented 4 years ago

The results from the 16 different models are combined like you said, but they are equally weighted. If I am following you correctly, you are interested in looking at these separate results first and perhaps adjusting the weights to them so that some that determine REM better with your montage (e.g. chin configuration) are prioritized over ones that do not.

One way to do this without massive testing would be to examine the hypnodensity pickle file that gets generated during the processing. For example, CHP040.hypno_pkl the pickled hypnodensity file that is produced for the CHP040.EDF file included with the repository and the batch or shell script is configured to save the hypnodensity. If you load this with the pickle module (import pickle) you will get a list containing the 16 hypnodensities (Nx5 arrays with the probability of sleep stage listed by column in the order of wake, stage 1, stage 2, stage 3/4, and REM sleep). These are the hypnodensities generated from the 16 sleep scoring models which are averaged together. The hypnogram, in turn, is calculated by selecting the column with the highest value (probability) for each row. The rows represent the 15 s epochs being scored and the columns represent the probability of the sleep stages, and the last column represent the probability of REM sleep. You can look through these hypnodensities to see which ones have rows where the last column has the largest value. These rows or epochs would ultimately be labeled as REM sleep if not for the other models in the ensemble showing greater values in the other columns for those rows. If it turns out a few of the models are better, then you can use those in the model selection found in the inf_config.py file (i.e. self.models_used).

All that said, it may still be simpler to run each model individually, by updating the inf_config.py file with the single model of interest and then examine the results that way; looking for the hypnograms with REM sleep labeled and then creating your ensemble from that subset of models.

Mensen commented 4 years ago

I hadn't thought of examining the pickle file... certainly doing that will be less time consuming than running each model individually so I'll have a look at that option and see if any model consistently produces more REM on our files than others (with the assumption that that particular model, or set of models, is probably less dependent on a good EMG channel).

Thanks for the suggestion!