csndl-iitd / realtime-sleep-staging

Using ML for identification of sleep stages in real time in humans
MIT License
0 stars 0 forks source link

Find good datasets to use for the project #5

Open gsaurabhr opened 5 months ago

gsaurabhr commented 5 months ago

Below is a list of datasets. Add more as you find them if needed. Create a new comment for each dataset and any notes about the data can be added to those.

You can consider factors such as:

  1. How many subjects they have?
  2. What is the duration of recording?
  3. How many EEG channels?
  4. Any auxiliary channels (like hog/vog/emg/HR/respiration rate and what not)?
  5. Expert-annotated ground truth?
  6. Ease of availability - publicly available / need to register / not available and need to write to researcher directly

Maybe make a table with these parameters for prominent datasets below. Based on that we can decide which ones to go for.

Ayush-Tibrewal commented 4 months ago

SLEEP EDF SC files: 7 channels ST files: 5 channels

Ayush-Tibrewal commented 4 months ago

https://osf.io/chav7/wiki/home/ https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5998176/

Tanvig commented 4 months ago
S.No.
Dataset No. of subjects No. of EEG channels Duration of recording Auxillary channels Expert annotations (Y/N) Ease of availability Comments Link
1 sleep-edf expanded database 153 + 44 = 197 2 20 hours (2 nights)9 hours (2 nights) EOG, EMG, respiration, chin EMG, Body temperature Yes Downloadable   https://www.physionet.org/content/sleep-edfx/1.0.0/sleep-cassette/#files-panel
2 ISRUC-Sleep Dataset 100, 8, 10 6 ~8 hours EOG, chin EMG, ECG, Leg EMG, Snore, AIrflow, Abdominal effort, Pulse oximetry, Body position Yes Available, not able to download For 100 subjects, most have sleep apnea (one session). For 8 subjects, with sleep disorders (2 sessions on different dates). For 10 subjects, healthy group (one session) https://sleeptight.isr.uc.pt/?page_id=48
3 Dreem Sleep Stage Classification Challenge   7   Pulse oximetry, accelemoter Yes Not available   https://www.kaggle.com/c/dreem-sleep-stages/data
4 Sleep Disorders Research Center (SDRC) 60 14 8 hours 6 EOG and 3 EMG channels Yes Downloadable Power spectral features for each frequency band is provided https://data.mendeley.com/datasets/3hx58k232n/4
5 Massachusetts General Hospital’s (MGH) 1983 6 7.7 hours EOG, EMG, EKG, respiration signals, and oxygen saturation (SaO2) Yes Downloadable Most subjects have sleep disorders, and most files are in .mat format https://physionet.org/content/challenge-2018/1.0.0/
6 Dreem-automated-sleep-staging 9 5 ~7 hours 3 Accelerometers Yes Available in .npy, .json format Data is for 9 nights (6 in training and 3 in test set) https://www.kaggle.com/competitions/dreem-automated-sleep-staging/data
7 OSF Nap EEG 20 62 Each session (task + nap) was of 2 hours, nap was 30 min or 60 min 2 EOG channels Yes (sleep stages as well as spindles at 30s intervals) - single rater Data available in .eeg, .vmrk and .vhdr format Data obtained during naps taken by healthy adult participants after performance of a visual working memory task. Each participant took part in two recording sessions during which each completed a high- or low-load scene working memory task followed by a 30 or 60-minute nap on a bed inside a sound-attenuated recording chamber https://osf.io/sqg4mhttps://osf.io/chav7/https://osf.io/ebvsr


gsaurabhr commented 3 months ago

https://osf.io/chav7/wiki/home/ https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5998176/

I think this will be good. This dataset contains 64-channel recordings. They have manual annotations by experts for sleep stage (and spindles, that we do not need). They also seem to cover a good number of sleep stages in their recording window.

One possibility is that Tanvi can work with this with 3D CNNs (or other approaches that explicitly take into account spatio-temporal patterns). Ayush can use sleep-EDF and/or other datasets for the transformer based approach.

For this data, here are the next steps:

Complete the visualization as mentioned in the visualization issue Additional plot: for each sleep stage, how many subjects go through that stage. Try to visualize all hypnograms (sleep stage vs time) at a time (think about good ways to visualize that information). What I want is to get a sense of how many transitions we can see across different pairs of sleep stages, so that we have some idea about the coverage of the dataset. I think this dataset is sufficient, and there are not many other HD EEG datasets available. But if you want, you can spend some time trying to find alternate datasets also.