What does the cross-validation do when there are no repeats?

ryskina commented 9 months ago

Every condition in my experiment occurs only once, so I get the expected warnings in the beginning:

/home/ryskina/miniconda3/envs/glmsingle-updated/lib/python3.9/site-packages/glmsingle/glmsingle.py:663: UserWarning: None of your conditions occur in more than one run. Are you sure this is what you intend?
  warnings.warn(msg)
/home/ryskina/miniconda3/envs/glmsingle-updated/lib/python3.9/site-packages/glmsingle/glmsingle.py:668: UserWarning: Since there are no repeats, standard cross-validation usage of <wantglmdenoise> cannot be performed.
  warnings.warn(msg)
/home/ryskina/miniconda3/envs/glmsingle-updated/lib/python3.9/site-packages/glmsingle/glmsingle.py:673: UserWarning: Since there are no repeats, standard cross-validation usage of <wantfracridge> cannot be performed.
  warnings.warn(msg)

However, if I ask for all four model types (A-D), the code still seems to be running some kind of cross-validation:

*** DETERMINING GLMDENOISE REGRESSORS ***

*** CROSS-VALIDATING DIFFERENT NUMBERS OF REGRESSORS ***

chunks:   0%|          | 0/19 [00:00<?, ?it/s]
chunks:   5%|▌         | 1/19 [05:59<1:47:50, 359.47s/it]

Could you please clarify what is being cross-validated here if standard cross-validation usage of <wantglmdenoise> cannot be performed? Should I be stopping at Type B in this case and not run C/D at all?

Thanks!

kendrickkay commented 6 months ago

Hi, sorry for the delay.

The behavior you saw was correct. It's just that it was doing a meaningless cross-validation loop in which it realizes there is nothing to cross validate and the cross-validation scores were therefore left at "0". This led to the behavior of just choosing the "first" hyperparameter. For GLMdenoise, that is using 0 PCs. For ridge regression, that is using a fraction of 1 (i.e. no regularization; full OLS solution).

However, to reduce confusion going forward, we have now updated the repository (through this commit: 87488d7) with a change that automatically sets wantglmdenoise and wantfracridge to 0 in the case that it detects no repeats (as in the scenario you mentioned in your initial message).

Does this make sense?

Kendrick

ryskina commented 6 months ago

Thank you very much @kendrickkay! I was somewhat confused by the cross-validation taking a long time (if there is only one hyperparameter option, I assumed the loop would terminate as soon as it goes through this "first" option). In any case, the new commit resolves my issue, I will use the updated code from now on!

Thanks, Maria

cvnlab / GLMsingle

What does the cross-validation do when there are no repeats? #129