calico / borzoi

RNA-seq prediction with deep convolutional neural networks.
Apache License 2.0
80 stars 10 forks source link

Clarification of model fold data splits #11

Closed adamyhe closed 10 months ago

adamyhe commented 11 months ago

For model f0, sequences labeled fold0 form the test set and fold1 form validation. For model f1, sequences labeled fold1 form the test set and fold2 form validation. Etc

Originally posted by @davek44 in https://github.com/calico/borzoi/issues/1#issuecomment-1703920369

Hi, I just wanted to clarify the exact splits that were used for each of the model folds. My reading is that the test/val/train splits for each of the models is set up as:

f0: test=fold0, val=fold1, train=rest f1: test=fold1, val=fold2, train=rest f2: test=fold2, val=fold3, train=rest f3: test=fold3, val=fold4, train=rest

Thanks!

davek44 commented 10 months ago

Yes, this is correct. Here's the code segment that performs that https://github.com/calico/basenji/blob/master/bin/basenji_train_folds.py#L397

adamyhe commented 10 months ago

Awesome. Thanks!