Open calmasri opened 2 years ago
Are you regenerating PDB alignments? There's no need to do that; we've pre-computed them all. See the RODA repository linked in the README.
same issue here. use the same script as you outlined in the readme. tge query seuqence is just a regular 237 aa sequence ython3 scripts/precompute_alignments_mmseqs.py /fasta_dir/query_seqs.fasta \ data/mmseqs_dbs \ uniref30_2103_db \ /fasta_dir \ /data/MMseqs2/build/bin/mmseqs \ --hhsearch_binary_path /usr/bin/hhsearch \ --env_db colabfold_envdb_202108_db \ --pdb70 data/pdb70/pdb70
Hi, @gahdritz, how much space does this data need?
I think the entire thing is around 2TB, but you can download subsets of it.
I was trying to generate new alignments using the
precompute_alignments_mmseqs.py
script:Where query_seqs.fasta was generated from
scripts/data_dir_to_fasta.py
and contains almost all the structures in data/pdb_mmcif/mmcif_files (minus ~500-1000 structures).I'm running on a machine with the following specs: 4 GPUs - Tesla V100 GPU Memory: 64 (GB) Cpus: 32 Memory: 244 GB
The script has been running for about 5 days now, I'm not sure if it's normal. How long should it normally take, and would I need more than 3TB storage space allocated for the output?