aqlaboratory / openfold

Trainable, memory-efficient, and GPU-friendly PyTorch reproduction of AlphaFold 2
Apache License 2.0
2.83k stars 551 forks source link

mmseqs inference #152

Open kchu02 opened 2 years ago

kchu02 commented 2 years ago

There is clear instructions on running Alphafold on its original sequence alignment protocol but that on mmseqs is unclear. Is mmseqs supported for both monomeric and multimeric structures?

If so, should the multimeric fasta remains the same structure as that in Alphafold? What is the corresponding flag(s)/command(s) and is precomputed MSA required?

gahdritz commented 2 years ago

Currently, the stock inference script w/ complex support doesn't support mmseqs inference automatically. For that, you'll have to use the alignment precomputation scripts in scripts and then pass them to the inference script using the --use_precomputed_alignments flag.

kchu02 commented 2 years ago

While we try to proceed with precompute_alignments_mmseqs.py, we experience another error.

python3 scripts/precompute_alignments_mmseqs.py input.fasta \
    data/mmseqs_dbs \
    uniref30_2103_db \
    alignment_dir \
    /working/MMseqs2/build/bin/mmseqs \
    --hhsearch_binary_path hhsearch \
    --env_db colabfold_envdb_202108_db \
    --pdb70 data/pdb70/pdb70

Error:

Traceback (most recent call last):
  File "scripts/precompute_alignments_mmseqs.py", line 175, in <module>
    main(args)
  File "scripts/precompute_alignments_mmseqs.py", line 84, in main
    cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE
  File "/working/openfold/lib/conda/envs/openfold_venv/lib/python3.7/subprocess.py", line 800, in __init__
    restore_signals, start_new_session)
  File "/working/openfold/lib/conda/envs/openfold_venv/lib/python3.7/subprocess.py", line 1551, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
OSError: [Errno 8] Exec format error: 'scripts/colabfold_search.sh'
gahdritz commented 2 years ago

Looks like that script was missing a shebang. Fixed in 1c279f9.