Open JamesCameron7 opened 4 years ago
Use "Cross validation by feature" option of Test and Score. To use it, you will need to convert your ID variable from String to Categorial (use the Edit domain widget). Also, because "Cross validation by feature" already performs CV, you need remove the Data Sampler and connect all your data to Test and Score.
Thanks Marko, in that case is it not possible to 'test on test data' as you aren't splitting the dataset prior to cross-validation?
"Cross validation by feature" is a leave-a-patient-out type of cross validation, which is testing on data that was not used in learning, so it should be fine methodologically. Yes, because you can not set train/test set proportions, time complexity can suffer, but I do not see other drawbacks.
Does that suffice? If not, I am interested to hear what are you trying to do.
thanks for clarifying! I don't generally use leave-one-out cross validation but it makes sense to me now.
What i normally do is split the the dataset into training and test (70/30) and randomly resample the splits a number of times (say 50 iterations). Then predict on each resampled test set and the average of the multiple model iterations is reported in terms of sens/spec/accuracy etc.
therefore it would be cool if there was the option to stratify patient id's in the data sampler widget so i could compare to my previous analysis, but for now i'll try out the cross validation by feature!
So, if I understand you, you would need a Data sampler that is aware of feature groups?
Yes I believe so.
Similar to the 'Cross-validation by feature' option, if you had say, "split based on feature" within the Data Sampler options, where the Feature would be the ID category, then i think this would work could work?
Hello
I'm trying to split my IR spectral dataset into training and test sets for classification models.
I have nine IR spectra per patient (9 per sample), and when i use the data sampler widget to split the dataset, spectra from individual patients are being included both the training and test sets.
Is there a way to stop this and ensure all 9 spectra from one patient stay together in either the training or the test? I've attached my workflow in case there is anything that i am missing.
Any advice would be great!
Many thanks James