Open tomginsberg opened 2 years ago
Dear @tomginsberg, thanks for the issue. We are currently working on it but unfortunately, it is not as simple as updating the permissions. As a temporary workaround, you can use the following code to obtain the data:
from genomic_benchmarks.loc2seq import download_dataset
download_dataset('demo_human_or_worm', version=0, use_cloud_cache=False)
This will download human and worm genomes and it will create the dataset on your disk. Afterward, you can use your original code to load the dataset from the disk:
from genomic_benchmarks.dataset_getters.pytorch_datasets import DemoHumanOrWorm
dset = DemoHumanOrWorm(split='train')
It seems to be a known issue of gdown
: https://github.com/wkentaro/gdown/issues/43
Google cache was set to False
by default in 5b02bb9745efb6f9328da98de19c7729ecdefa9e.
You can use your original code to download the dataset and it will create it for you from the reference genome.
However, if you want to try to download it from the google cache, you can do it by manually setting the use_cloud_cache=True
:
from genomic_benchmarks.dataset_getters.pytorch_datasets import DemoHumanOrWorm
dset = DemoHumanOrWorm(split='train', use_cloud_cache=True`)
I have returned use_cloud_cache=True
as the default (it is a desirable behavior in 99.9% cases), so I am reopening the issue. We need to examined it better. Unfortunately, it is hard to reproduce the error.
The possible solutions (if gdown
does not figure it out soon) might be
googleDriveFileDownloader
if possible (the issue appeared after we switched from googleDriveFileDownloader
to gdown
. I have not found documented what exactly was the issue with googleDriveFileDownloader
.
This should be a simple fix of updating human or worm to Anyone with the link