google-deepmind / alphafold3

AlphaFold 3 inference pipeline.
Other
5.06k stars 563 forks source link

Is it necessary to continuously update the database? #104

Closed junjunbear closed 7 hours ago

junjunbear commented 7 hours ago

According to the file names downloaded from the database, is the cutoff date for entries in the PDB database September 28, 2022? Would continuous updates to the database help improve the success rate and accuracy of predictions?

echo "Start Fetching and Untarring 'pdb_2022_09_28_mmcif_files.tar'"
wget --quiet --output-document=- \
    "${SOURCE}/pdb_2022_09_28_mmcif_files.tar.zst" | \
    tar --use-compress-program=zstd -xf - --directory="${db_dir}" &
Augustin-Zidek commented 7 hours ago

Hi,

yes, you can run AlphaFold with more recent versions of all of the databases. We mirror the versions that were used in the paper for reproducibility, but you can download and use the latest versions.

It should improve accuracy, but mostly only in cases where the MSA gets deeper thanks to this (e.g. if you are trying to fold a sequence from a protein family that has recently been sequenced and added in UniProt). Same for templates, although for templates updated PDB will likely have even less impact than updated genetic databases.