The best classifier overall AUROC will not have the distributions in the training data, it will have a prior distribution equal to the private test data.
However, we don't know the distribution in the private test data, so we have to guess it.
The relationship between the proportion of preictal segments in each individual with the overall proportion seems to be independent for train and publictest. Half the time they agree, half the time they disagree (see below).
I think the best prior is either:
Assume the same distribution of preictal/interictal in each subject (which we set to be the overall distribution in the training data)
Assume the complete dataset (train+publictest+privatetest) has the same distribution across each subject and then guess the privatetest distribution for each subject based on the train and publictest distributions.
This comes from the all Zeros but one session (which is ones) AUROC submissions.
Train PublicTest
Dog_1 0.47 0.49
Dog_2 0.48 0.56
Dog_3 0.44 0.47
Dog_4 0.52 0.48
Dog_5 0.49 0.50
Patient_1 0.55 0.49
Patient_2 0.55 0.52
The best classifier overall AUROC will not have the distributions in the training data, it will have a prior distribution equal to the private test data. However, we don't know the distribution in the private test data, so we have to guess it. The relationship between the proportion of preictal segments in each individual with the overall proportion seems to be independent for train and publictest. Half the time they agree, half the time they disagree (see below).
I think the best prior is either:
Subject preictal proportion compared to overall proportion Train PublicTest Dog_1 < < Dog_2 < > Dog_3 < < Dog_4 > < Dog_5 < = Patient_1 > < Patient_2 > >
This comes from the all Zeros but one session (which is ones) AUROC submissions. Train PublicTest Dog_1 0.47 0.49 Dog_2 0.48 0.56 Dog_3 0.44 0.47 Dog_4 0.52 0.48 Dog_5 0.49 0.50 Patient_1 0.55 0.49 Patient_2 0.55 0.52