Closed kohanlee1995 closed 10 months ago
We weren't planning to make these changes, but we certainly welcome any contributions to the repo!
Hi @kohanlee1995,
You are right there is some functionality in ChromatinProfileDataset
which is only available when using the dataset directly rather than in the dataloader.
However, if you just once use the ChromatinProfileDataset
class directly with the setting save_liftover=True
the updated version of the DeepSEA dataset files with hg38 coordinates will be created.
It will create the files {train,val,test}_hg38_coords_targets.csv
so the line
coords_target_path = f'{self.data_path}/{split}_{self.ref_genome_version}_coords_targets.csv'
in the dataloader will then work with ref_genome_version = 'hg38'
.
Thank you @exnx and @cbirchsy for your response. That's what I did and it worked. Just wanted to make sure I am on the right path.
I am attempting to replicate the validation process for the DeepSEA benchmark. The original DeepSEA version is hg19, while the reference genome is hg38. I've noticed that liftover is available in the source code, specifically within the
ChromatinProfileDataset
class. However, using this liftover functionality seems to be restricted unless I directly use theChromatinProfileDataset
.Within the class
ChromatinProfile
, arguments forChromatinProfileDataset
are:This code forces the genome version of the reference and dataset to be the same.
My question is whether it's possible to introduce flexibility into the package or provide an updated version of the DeepSEA benchmark that supports the hg38 reference genome?