ML-Bioinfo-CEITEC / genomic_benchmarks

Benchmarks for classification of genomic sequences
Apache License 2.0
107 stars 14 forks source link

Datasets not found #39

Open icyjayd opened 3 months ago

icyjayd commented 3 months ago

I have installed this package and but I can't load the datasets.

My code is as follows:

from genomic_benchmarks.data_check import list_datasets
from genomic_benchmarks.dataset_getters.pytorch_datasets import get_dataset
from genomic_benchmarks.data_check import info
from genomic_benchmarks.loc2seq import download_dataset

When trying to download, for example, 'demo_coding_vs_intergenomic_seqs' I get FileNotFoundError: Dataset demo_coding_vs_intergenomic_seqs not found.

For completion's sake, I wrote code to attempt to download each of the datasets.

for dset in list_datasets():
    try:
        get_dataset(dset, split='train')
        print("success!")
    except:
        print(dset, "not found")

The output is as follows:

demo_coding_vs_intergenomic_seqs not found
human_enhancers_cohn not found
human_ocr_ensembl not found
demo_human_or_worm not found
human_ensembl_regulatory not found
drosophila_enhancers_stark not found
dummy_mouse_enhancers_ensembl not found
human_enhancers_ensembl not found
human_nontata_promoters not found

The same occurs with the info and download_dataset functions as well. Any help on what I'm doing wrong would be appreciated.