cbrnr / sleepecg

Sleep stage detection using ECG
BSD 3-Clause "New" or "Revised" License
90 stars 23 forks source link

Included pre-trained classifiers #130

Closed cbrnr closed 1 year ago

cbrnr commented 1 year ago

SleepECG currently includes three pre-trained classifiers trained on 1971 records (and tested on another 1000 records). However, when applying this classifier to unseen (new) data, would it not make sense to have it trained on all available records (i.e. 2971)? Of course we would not have a performance measure then, but we would have more information available for training.

What is the preferred way (best practice) to distribute pre-trained classifiers?

agramfort commented 1 year ago

you should always have a test set to evaluate a model. Now maybe you dont need that many. In ML the models are often pretrained on the train set eg on image net and CIFAR. The train sets are pre specified

cbrnr commented 1 year ago

Thanks @agramfort! Is it correct that the larger the train set, the better, especially for neural networks? If that's true, wouldn't it be best to create a final classifier with all the data sets to apply then to completely unseen (new) data? I understand that it is important to evaluate the model, but does it not make sense to do both (i.e. first evaluate with train/test split, but then give people the classifier trained on all the data)?

agramfort commented 1 year ago

with deep learning you generally need a validation set for early stopping the training so refitting on the full data without left out is considered dangerous

Message ID: @.***>

cbrnr commented 1 year ago

So for future classifiers we might consider using a larger train set (and a smaller test set, e.g. 100–500 records instead of 1000).