Closed cbrnr closed 1 year ago
you should always have a test set to evaluate a model. Now maybe you dont need that many. In ML the models are often pretrained on the train set eg on image net and CIFAR. The train sets are pre specified
Thanks @agramfort! Is it correct that the larger the train set, the better, especially for neural networks? If that's true, wouldn't it be best to create a final classifier with all the data sets to apply then to completely unseen (new) data? I understand that it is important to evaluate the model, but does it not make sense to do both (i.e. first evaluate with train/test split, but then give people the classifier trained on all the data)?
with deep learning you generally need a validation set for early stopping the training so refitting on the full data without left out is considered dangerous
Message ID: @.***>
So for future classifiers we might consider using a larger train set (and a smaller test set, e.g. 100–500 records instead of 1000).
SleepECG currently includes three pre-trained classifiers trained on 1971 records (and tested on another 1000 records). However, when applying this classifier to unseen (new) data, would it not make sense to have it trained on all available records (i.e. 2971)? Of course we would not have a performance measure then, but we would have more information available for training.
What is the preferred way (best practice) to distribute pre-trained classifiers?