google-deepmind / alphafold3

AlphaFold 3 inference pipeline.
Other
4.51k stars 473 forks source link

Issue with zstd - fetch_databases.py erring out kind of midway #32

Closed claczny closed 17 hours ago

claczny commented 17 hours ago

Dear all,

thank you very much for making the code and weights available as well as providing installation instructions etc.

I tried to fetch the databases, but hit a bump in between. As you can see, several databases downloaded and unpacked, but I am getting an error:

Downloading all data to: ../databases
STARTING download bfd-first_non_consensus_sequences.fasta.zst from https://storage.googleapis.com/alphafold-databases/v3.0 to ../databases
STARTING download mgy_clusters_2022_05.fa.zst from https://storage.googleapis.com/alphafold-databases/v3.0 to ../databases
STARTING download nt_rna_2023_02_23_clust_seq_id_90_cov_80_rep_seq.fasta.zst from https://storage.googleapis.com/alphafold-databases/v3.0 to ../databases
STARTING download pdb_2022_09_28_mmcif_files.tar.zst from https://storage.googleapis.com/alphafold-databases/v3.0 to ../databases
STARTING download pdb_seqres_2022_09_28.fasta.zst from https://storage.googleapis.com/alphafold-databases/v3.0 to ../databases
STARTING download rfam_14_9_clust_seq_id_90_cov_80_rep_seq.fasta.zst from https://storage.googleapis.com/alphafold-databases/v3.0 to ../databases
STARTING download rnacentral_active_seq_id_90_cov_80_linclust.fasta.zst from https://storage.googleapis.com/alphafold-databases/v3.0 to ../databases
STARTING download uniprot_all_2021_04.fa.zst from https://storage.googleapis.com/alphafold-databases/v3.0 to ../databases
STARTING download uniref90_2022_05.fa.zst from https://storage.googleapis.com/alphafold-databases/v3.0 to ../databases
FINISHED downloading uniref90_2022_05.fa.zst from https://storage.googleapis.com/alphafold-databases/v3.0 to ../databases.
FINISHED downloading bfd-first_non_consensus_sequences.fasta.zst from https://storage.googleapis.com/alphafold-databases/v3.0 to ../databases.
STARTING decompressing of bfd-first_non_consensus_sequences.fasta.zst
STARTING decompressing of uniref90_2022_05.fa.zst
FINISHED downloading rnacentral_active_seq_id_90_cov_80_linclust.fasta.zst from https://storage.googleapis.com/alphafold-databases/v3.0 to ../databases.
STARTING decompressing of rnacentral_active_seq_id_90_cov_80_linclust.fasta.zst
FINISHED downloading rfam_14_9_clust_seq_id_90_cov_80_rep_seq.fasta.zst from https://storage.googleapis.com/alphafold-databases/v3.0 to ../databases.
STARTING decompressing of rfam_14_9_clust_seq_id_90_cov_80_rep_seq.fasta.zst
FINISHED downloading uniprot_all_2021_04.fa.zst from https://storage.googleapis.com/alphafold-databases/v3.0 to ../databases.
STARTING decompressing of uniprot_all_2021_04.fa.zst
FINISHED downloading pdb_seqres_2022_09_28.fasta.zst from https://storage.googleapis.com/alphafold-databases/v3.0 to ../databases.
FINISHED downloading pdb_2022_09_28_mmcif_files.tar.zst from https://storage.googleapis.com/alphafold-databases/v3.0 to ../databases.
FINISHED downloading nt_rna_2023_02_23_clust_seq_id_90_cov_80_rep_seq.fasta.zst from https://storage.googleapis.com/alphafold-databases/v3.0 to ../databases.
STARTING decompressing of nt_rna_2023_02_23_clust_seq_id_90_cov_80_rep_seq.fasta.zst
STARTING decompressing of pdb_2022_09_28_mmcif_files.tar.zst
STARTING decompressing of pdb_seqres_2022_09_28.fasta.zst
FINISHED downloading mgy_clusters_2022_05.fa.zst from https://storage.googleapis.com/alphafold-databases/v3.0 to ../databases.
STARTING decompressing of mgy_clusters_2022_05.fa.zst
Traceback (most recent call last):
  File "fetch_databases.py", line 131, in <module>
    main(sys.argv[1:])
  File "fetch_databases.py", line 119, in main
    DATABASE_FILES,
  File "/usr/lib64/python3.6/concurrent/futures/_base.py", line 586, in result_iterator
    yield fs.pop().result()
  File "/usr/lib64/python3.6/concurrent/futures/_base.py", line 432, in result
    return self.__get_result()
  File "/usr/lib64/python3.6/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
  File "/usr/lib64/python3.6/concurrent/futures/thread.py", line 56, in run
    result = self.fn(*self.args, **self.kwargs)
  File "fetch_databases.py", line 80, in download_and_decompress
    check=True,
  File "/usr/lib64/python3.6/subprocess.py", line 423, in run
    with Popen(*popenargs, **kwargs) as process:
  File "/usr/lib64/python3.6/subprocess.py", line 729, in __init__
    restore_signals, start_new_session)
  File "/usr/lib64/python3.6/subprocess.py", line 1364, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'zstd': 'zstd'

real    0m0.278s
user    0m0.168s
sys     0m0.154s
Wed Nov 13 21:22:58 CET 2024

This is from re-running the fetch_databases.py script a second time after it err'd out with the same error from the original run.

Kindly let me know how this can be resolved.

Thank you very much again!

Best wishes, Cedric

claczny commented 17 hours ago

Ok, my bad, sorry.

Nevermind. I thought zstd would be available on the access node somehow but it wasn't. After getting onto a compute node, the problem was solved, as zstd was available there.

Augustin-Zidek commented 4 hours ago

No worries, thanks for reporting this in fact - the script should fail with an informative error in such case. I will add a check at the top of the script that warns you when curl or zstd are missing on the system:

import shutil

...

if shutil.which('curl') is None:
  raise ValueError('curl is not installed. Please install it and try again.')
if shutil.which('zstd') is None:
  raise ValueError('zstd is not installed. Please install it and try again.')
Augustin-Zidek commented 4 hours ago

Fixed in https://github.com/google-deepmind/alphafold3/commit/1d3e173b25504d70e4f49edee25730cdf20386d5.

The script will now fail with a user-friendly error message immediately after starting if either curl or zstd are missing.