Thank you for providing the code to generate a tf record dataset for Basenji. I would like to make tf records from an epigenetic track for the same subset of train/test/validation sequences as was used in the Basenji/Enformer model (I retrieved it from gs://basenji_barnyard/data/human/sequences.bed).
As far as I understand, the script in preprocess.py randomly divides the genome (or a subset if specified with the -s option) into train, test and validation sets. Would it be possible to change this to the preset sets from the above sequences.bed file, or did I overlook something and is this already possible?
Hi!
Thank you for providing the code to generate a tf record dataset for Basenji. I would like to make tf records from an epigenetic track for the same subset of train/test/validation sequences as was used in the Basenji/Enformer model (I retrieved it from
gs://basenji_barnyard/data/human/sequences.bed
).As far as I understand, the script in
preprocess.py
randomly divides the genome (or a subset if specified with the -s option) into train, test and validation sets. Would it be possible to change this to the preset sets from the above sequences.bed file, or did I overlook something and is this already possible?Thank you in advance!