Stanford-STAGES / stanford-stages

Automated sleep staging scoring and narcolepsy identification
76 stars 26 forks source link

Tips to improve performance #22

Closed skjerns closed 3 years ago

skjerns commented 4 years ago

Excellent work, and thanks so much that you are providing the model weights. Most reseachers don't supply the weights, which makes it sometimes impossible to recreate the results. So thank's again for doing so!

We're currently evaluating the model with a dataset we are working on of 100 NC1 patients and 130 controls. However, we're not able to reproduce the results reported in the paper.

Inter-rater reliability of hypnograms is okay-ish. However, I know myself how hard automatic sleep scoring is (mostly due to the broken scoring standard), so I think the results are overall good enough, and probably the best that are readily available on the open-source market. rater

However, the NC1 detection mostly fails for us roc scores

We're using the following input configuration

{'central3': 'EEG C3',
 'central4': 'EEG C4',
 'occipital': 'EEG Oz',
 'eog_l': 'EOG LOC-A2',
 'eog_r': 'EOG ROC-A1',
 'chin_emg': 'EMG Chin'}

Do you have any tips on how to improve the performance? Should we add more channels (even if not entirely fitting, as Pz for O1?)?

informaton commented 4 years ago

Thanks for the kind words! We are working on newer models for our narcolepsy classifier, but you are right, these results indicate a failure with the classification somewhere in the process.

Here are some things that may be contributing to the poor results:

  1. The configuration you have here shows EEG Oz. Do you not have a O1 or O2? We have not had success trying alternative channels (i.e. I would not expect improvement using Pz for O1).
  2. The EOG channels show that they are referenced with A1/A2 but the EEG channels are not explicitly labeled like this. Have the EEG channels been referenced to A1/A2 as well?
  3. Do your studies begin before the lights have been turned off, i.e. do they include bio calibrations? We have found the first few minutes of the study are often scored incorrectly here because the models are trying to fit the calibration sections of the study, which they were not trained on, into one of the valid sleep categories. It is a relatively simple matter to remove these sections from the hypnogram after the fact, however, this information (i.e. the hypnodensity) is going to be sent to the narcolepsy classifier before an adjustment is made. The 'beta' branch has some options for establishing lights on/off in the configuration file (.json), however this branch uses a different set of narcolepsy models which are still being worked on.
skjerns commented 4 years ago

Thanks for the quick response.

1) Unfortunately we do not have O1/O2 in all cases, often only O1 or O2 or Oz, or even only Pz 2) This is unfortunately not clear for me either: Our data is from different sources, so the labels are a bit mixed. What does the network expect? I would think referenced channels should be the input, right? 3) Ah, that is indeed the case, we have quite a bit of Wake before lights off. However, my intuition was, that the Hypnodensity is dealing with this quite well, as the accuracy score is above 75-80% (as compared to the F1 of 0.58).

Btw: If you are refactoring the model, it might be very useful to implement it in such a way that the models can be loaded once and be reused. I was able to adapt the NarcoApp such that it loads the NC classification models and keeps them in memory for the succeeding classification, however, this was not possible for the Hypnodensity model (as variable names are the same and can't be that easily re-assigned in tensorflow).

informaton commented 4 years ago

You are welcome.

  1. That is too bad; we have collaborators who have run into similar challenges, and I have not heard of anyone having great success yet using other, proximal channels so far.
  2. The network expects referenced channels. We run into the same issue as well, where the montage labels are not always explicit.
  3. The hypnodensity may still be accurate overall, but the beginning of a sleep study has some very strong features that are specific to narcolepsy (e.g. the presence of a SOREM and where sleep onset occurs). We were finding the first minute or two to be scored as REM or N2/N3 sleep in many cases where the lights off/on were not accounted for, which in turn led to a narcolepsy diagnosis. Removing these epochs from the narcolepsy classification portion made a substantial difference in these cases.

Thanks for the tip regarding keeping the models in memory. We are working to make the software more configurable for those who want to examine more than one study at a time, and will keep your thought here in mind.

skjerns commented 4 years ago

3) I'll create new EDFs with the beginning cropped to lights off and run the analysis again - I'll let you know what comes out :) (will take a while, last time it took ~4-5 days for all calculations, however mainly the feature calculation takes a long time, and on the test machine I didn't have enough disk space to save the results unfortunately)

neergaard commented 4 years ago

@skjerns Just to add a comment re. O1/O2: You do not explicitly need to have both, just one of them. The same applies for the central, right @informaton?

informaton commented 4 years ago

That is correct. Either O1 or O2 on its own is fine for the occipital lead, and either C3 or C4 for the central channel (like you said). Only one central and one occipital channel is fed into the model; the best quality occipital and central channels are selected whenever there are two to choose from. I don't know that we have fed Oz in though or what the sensitivity to the results are when using that or something like Pz in place of O1 or O2.

On Thu, Sep 3, 2020 at 7:41 AM Alexander Neergaard Olesen < notifications@github.com> wrote:

@skjerns https://github.com/skjerns Just to add a comment re. O1/O2: You do not explicitly need to have both, just one of them. The same applies for the central, right @informaton https://github.com/informaton?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Stanford-STAGES/stanford-stages/issues/22#issuecomment-686539666, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABC3F7PVVWUV6DZF7WI7JBLSD6TKJANCNFSM4QMX6I5A .