KosinskiLab / AlphaPulldown

https://doi.org/10.1093/bioinformatics/btac749
GNU General Public License v3.0
178 stars 39 forks source link

Problem with MSA for Q6DI86 #376

Open poojaparameswaran99 opened 4 days ago

poojaparameswaran99 commented 4 days ago

I am attempting to download the MSA as .pkl packages for ~6500 accessions. I am running using the command: (AlphaPulldown) a05-XXXXX@a05-duke01:~/AlphaFastPPi/alphapulldown/scripts$ create_individual_features.py --fasta_paths ../../XXXXXX/fasta/candidates.fasta --data_dir ../../alphafold_non_docker/ --output_dir ../testout/ --max_template_date 2050-04-04 --use_mmseqs2 True --skip_existing True

I have succesfully done ~1500, but I need to do the remaining, this is taking an extensive amt of time though, any suggestions on how to speed it up and just get the pkl files for the accessions I need (Download source)?

I am running into this issue:

I0703 16:53:13.419163 139630757807936 create_individual_features.py:236] Running MMseqs2 for feature generation...
I0703 16:53:13.427133 139630757807936 objects.py:189] You chose to calculate MSA with mmseq2.
Please also cite: Mirdita M, Schütze K, Moriwaki Y, Heo L, Ovchinnikov S and Steinegger M. ColabFold: Making protein folding accessible to all. Nature Methods (2022) doi: 10.1038/s41592-022-01488-1
COMPLETE: 100%|████████████████████████████████████████████████████████████| 150/150 [elapsed: 00:13 remaining: 00:00]
I0703 16:53:55.408864 139630757807936 hhsearch.py:85] Launching subprocess "hhsearch -i /tmp/tmptn3t9qwi/query.a3m -o /tmp/tmptn3t9qwi/output.hhr -maxseq 1000000 -d ../testout/Q6DI86_env/templates_101/pdb70"
I0703 16:53:55.410616 139630757807936 utils.py:36] Started HHsearch query
I0703 16:53:55.558456 139630757807936 utils.py:40] Finished HHsearch query in 0.148 seconds
Traceback (most recent call last):
  File "/home/a05-XXXX/micromamba/envs/AlphaPulldown/bin/create_individual_features.py", line 372, in <module>
    app.run(main)
  File "/home/a05-XXXX/micromamba/envs/AlphaPulldown/lib/python3.10/site-packages/absl/app.py", line 308, in run
    _run_main(main, args)
  File "/home/a05-XXXX/micromamba/envs/AlphaPulldown/lib/python3.10/site-packages/absl/app.py", line 254, in _run_main
    sys.exit(main(argv))
  File "/home/a05-XXXX/micromamba/envs/AlphaPulldown/bin/create_individual_features.py", line 363, in main
    process_sequences_individual_mode()
  File "/home/a05-XXXX/micromamba/envs/AlphaPulldown/bin/create_individual_features.py", line 293, in process_sequences_individual_mode
    create_and_save_monomer_objects(curr_monomer, pipeline)
  File "/home/a05-XXXX/micromamba/envs/AlphaPulldown/bin/create_individual_features.py", line 237, in create_and_save_monomer_objects
    monomer.make_mmseq_features(
  File "/home/a05-XXXX/AlphaFastPPi/alphapulldown/objects.py", line 205, in make_mmseq_features
    ) = get_msa_and_templates(
  File "/home/a05-XXXX/micromamba/envs/AlphaPulldown/lib/python3.10/site-packages/colabfold/batch.py", line 785, in get_msa_and_templates
    template_feature = mk_template(
  File "/home/a05-XXXX/micromamba/envs/AlphaPulldown/lib/python3.10/site-packages/colabfold/batch.py", line 136, in mk_template
    hhsearch_result = hhsearch_pdb70_runner.query(a3m_lines)
  File "/home/a05-XXXX/micromamba/envs/AlphaPulldown/lib/python3.10/site-packages/alphafold/data/tools/hhsearch.py", line 94, in query
    raise RuntimeError(
RuntimeError: HHSearch failed:
stdout:

stderr:
hhsearch: Problem with data file. Is the file empty or is another process reading it?: Invalid argument
- 16:53:55.414 WARNING: In /opt/conda/conda-bld/hhsuite_1717510248446/work/src/ffindexdatabase.cpp:24: FFindexDatabase:

- 16:53:55.414 WARNING:         Could not read index file ../testout/Q6DI86_env/templates_101/pdb70_cs219.ffindex. Is the file empty or corrupted?

hhsearch: Problem with data file. Is the file empty or is another process reading it?: Invalid argument
- 16:53:55.414 WARNING: In /opt/conda/conda-bld/hhsuite_1717510248446/work/src/ffindexdatabase.cpp:24: FFindexDatabase:

- 16:53:55.414 WARNING:         Could not read index file ../testout/Q6DI86_env/templates_101/pdb70_a3m.ffindex. Is the file empty or corrupted?

I tried to delete the relevant accession env Q6DI86_env, and rerun, but that failed to work. I appreciate the help!!

dingquanyu commented 3 days ago

Hi,

Unfortunately, this problem seems to root from mmseqs2 or HHSuite. Could you please create an issue under their repos?

Yours Dingquan