Create Epoch_data_review

Tanvig commented 5 months ago

@gsaurabhr I’ve tried to create the epoch dataset from a single subject’s annotation file.

https://github.com/csndl-iitd/realtime-sleep-staging/tree/f11f1034802c5d1d593999e090aff084c3a06171/data/3D%20CNN%20Approach

This epoch dataset comprises 60 text files that contain the sleep stage for that particular 30-second epoch. The epoch duration is based on the difference between the mark-on and mark-off markers described in the annotation file (as the difference was approximately 30 seconds).

I’ve also added a CSV file so that you can review the data on which the epoch files were created.

Please review this data and kindly let me know if this is what we discussed. In that case, I will create epoch files for all the subjects.

gsaurabhr commented 5 months ago

The files in the folder should be epoched EEG data. The CSVs look good. You split the EEG array (2D array of 64 x #total samples) into data for each epoch (64 x #samples per epoch) and store it as numpy file (.npy). The naming convention is fine.

Also in some places the name includes "suj" instead of "sub". Fix that.

Tanvig commented 5 months ago

Update:

I have tried to fix the above suggested issue Link: https://github.com/csndl-iitd/realtime-sleep-staging/tree/ea54542218ba8cf3aa979585813867b19457a3dc/data/3D%20CNN%20Approach/subject29_l5nap_day1_epochs

Process followed:

Performed filtering, resampling, and average re-referencing on the raw EEG data.
Extracted the manual markers (Markon and Markoff) from the annotations.txt file. The difference between each Markon and Markoff marker is approximately 29–30 seconds.
Based on these markers, segmented the preprocessed raw data into epochs. So now there are 58 numpy files in the folder that correspond to the epoched raw data (~ 30 secs) for that particular subject. The below screenshot shows information about one such file. The shape is 64 2943 where 64 is the number of electrodes and 2943 is the sample size for 30 secs (30 sampling freq (100) = ~3000 sample points)

Queries

Kindly review these files and let me know if this is what you suggested and anything else I should take care of.
What should we do about the sleep stages (W, 1, 2, SWS)? Do I need to extract them somewhere?

Once these queries are resolved, I will replicate this process for all the subjects.

gsaurabhr commented 5 months ago

Looks good. Make the file names consistent. It will simplify your code later.

The txt and csv files are named "sujxxday1..." while the corresponding folder is named "subjectxx_I5nap_day1".

The sleep stages are labeled in the csv file, so you can read them from there whenever required. If you want, you can include that in the file names as a last substring: subj29_l5nap_day1epoch0w.npy for example. That way later when you want to read all the wake state files, you simply have to say glob("*_w.npy").

Tanvig commented 5 months ago

Updated the epochs data accordingly.

Link to the data: https://github.com/csndl-iitd/realtime-sleep-staging/tree/b59be0f2486ca97e6055ea7ecafd4b863f9281eb/data/3D%20CNN%20Approach

Note:

Haven't changed the name of the annotations file "suj29_day1_annotations.txt" since this is the name of the file in the actual data folder when we download from the OSF repository.
Have changed the name of the .csv file and the epochs files such that they are consistent. Their file names are what we download from the OSF repository (without the epoch suffix).
Have also updated the epoch file names to contain the corresponding sleep stage for easy future access.

csndl-iitd / realtime-sleep-staging