Being able to use multiple chromosomes to include in the dataset will help us get closer to training a production model trained on all available data.
This could be done by creating a dictionary of {chromosome names: FASTA paths} in RepeatSequenceDataset, processing all of them, and merging their repeats.
The list of chromosomes to include would be a hyperparameter in this case.
Being able to use multiple chromosomes to include in the dataset will help us get closer to training a production model trained on all available data.
This could be done by creating a dictionary of {chromosome names: FASTA paths} in RepeatSequenceDataset, processing all of them, and merging their repeats.
The list of chromosomes to include would be a hyperparameter in this case.