A dataset not found when I run "python script/pretrain.py -c config/pretrain/mc_gearnet_edge.yaml --gpus [0]"

horacehht commented 11 months ago

It seems that the file located in "https://ftp.ebi.ac.uk/pub/databases/alphafold/latest/UP000006548_3702_ARATH_v2.tar" really doesn't exist. When I entered this url in my browser, it also noticed me that the file doesn't exist.

14:43:55   Downloading https://ftp.ebi.ac.uk/pub/databases/alphafold/latest/UP000006548_3702_ARATH_v2.tar to /home/horace/scratch/protein-datasets/alphafold/UP000006548_3702_ARATH_v2.tar
Traceback (most recent call last):
  File "script/pretrain.py", line 50, in <module>
    dataset = core.Configurable.load_config_dict(cfg.dataset)
  File "/home/horace/.conda/envs/drug/lib/python3.7/site-packages/torchdrug/core/core.py", line 269, in load_config_dict
    return cls(**new_config)
  File "/home/horace/.conda/envs/drug/lib/python3.7/site-packages/decorator.py", line 232, in fun
    return caller(func, *(extras + args), **kw)
  File "/home/horace/.conda/envs/drug/lib/python3.7/site-packages/torchdrug/core/core.py", line 288, in wrapper
    return init(self, *args, **kwargs)
  File "/home/horace/.conda/envs/drug/lib/python3.7/site-packages/torchdrug/datasets/alphafolddb.py", line 122, in __init__
    tar_file = utils.download(self.urls[species_id], path, md5=self.md5s[species_id])
  File "/home/horace/.conda/envs/drug/lib/python3.7/site-packages/torchdrug/utils/file.py", line 31, in download
    urlretrieve(url, save_file)
  File "/home/horace/.conda/envs/drug/lib/python3.7/urllib/request.py", line 247, in urlretrieve
    with contextlib.closing(urlopen(url, data)) as fp:
  File "/home/horace/.conda/envs/drug/lib/python3.7/urllib/request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "/home/horace/.conda/envs/drug/lib/python3.7/urllib/request.py", line 531, in open
    response = meth(req, response)
  File "/home/horace/.conda/envs/drug/lib/python3.7/urllib/request.py", line 641, in http_response
    'http', request, response, code, msg, hdrs)
  File "/home/horace/.conda/envs/drug/lib/python3.7/urllib/request.py", line 569, in error
    return self._call_chain(*args)
  File "/home/horace/.conda/envs/drug/lib/python3.7/urllib/request.py", line 503, in _call_chain
    result = func(*args)
  File "/home/horace/.conda/envs/drug/lib/python3.7/urllib/request.py", line 649, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found

horacehht commented 11 months ago

Oh, I found that on the web the dataset's version turns to v4 instead of v2. So If I just used v4 dataset, will it have an effect on the experiments? Addtionally, how did I use v4?

Oxer11 commented 11 months ago

I think it's okay to use v4 instead of v2. The pre-training dataset doesn't have a large effect on the final performance.

horacehht commented 11 months ago

I think it's okay to use v4 instead of v2. The pre-training dataset doesn't have a large effect on the final performance.

I have downloaded the v4 dataset and put it into the correct directory. However, when I tried to run the command python script/pretrain.py -c config/pretrain/mc_gearnet_edge.yaml --gpus [0], the program still started to download the v2 dataset. I don't know how to deal with this condition.

Oxer11 commented 11 months ago

Sorry for the inconvience! This is because I set the default files as v2 datasets instead of v4 datasets. The easiest way to change this is to inherit the datasets.AlphaFoldDB class and rewrite the urls and md5s attributes here. The class will check the downloaded files according to filenames in urls and check themd5 values.

Sajib-006 commented 2 months ago

I think this url issue is resolved in the updated version(0.2.1) Installing the updated torchdrug fixed this Use: pip install torchdrug==0.2.1

DeepGraphLearning / GearNet

A dataset not found when I run "python script/pretrain.py -c config/pretrain/mc_gearnet_edge.yaml --gpus [0]" #45