aqlaboratory / openfold

Trainable, memory-efficient, and GPU-friendly PyTorch reproduction of AlphaFold 2
Apache License 2.0
2.76k stars 529 forks source link

download_roda_pdbs.sh doesn't download any files from server #277

Open awaelchli opened 1 year ago

awaelchli commented 1 year ago

The download script download_roda_pdbs.sh has this rsync command:

https://github.com/aqlaboratory/openfold/blob/84659c93ba6f06b8a0a2646d1cf27646c003a0c6/scripts/download_roda_pdbs.sh#L35

It doesn't download anything, perhaps because there aren't any files at the given path on the server. I'm following the README instructions after having downloaded the RODA files and running flatten_roda.sh.

Any ideas how to download these files?

jonathanking commented 1 year ago

Hi @awaelchli, I ran into this issue as well. Based on the discussion in this AlphaFold issue, I tried to modify the download server to use the PDBj mirror. Unfortunately, while this downloads data, it does not access the correct snapshot:

rsync --recursive --links --perms --times --compress -v --info=progress2 --delete data.pdbj.org::ftp_data/structures/divided/mmCIF/ $OUT_DIR

Simply switching the server to pdbj via this command seems to also struggle to download.

rsync -rlpt -v -z --delete snapshots.pdbj.org::20220103/pub/pdb/data/structures/divided/mmCIF/ $OUT_DIR.

jonathanking commented 1 year ago

@awaelchli I looked into this a bit more. I am fairly confident that this command can serve as a replacement to the original offending command. Of course it would be great if someone else can verify that this is reasonable.

aws s3 cp --no-sign-request s3://pdbsnapshots/20220103/pub/pdb/data/structures/divided/mmCIF $OUT_DIR --r ecursive 2>&1 > /dev/null

The RCSB in April of last year starting using AWS. This command just downloads the relevant snapshot used by RODA via the AWS CLI.