baker-laboratory / RoseTTAFold-All-Atom

Other
641 stars 114 forks source link

BLAST (SS assignment) pipeline fails for certain inputs #31

Open amorehead opened 7 months ago

amorehead commented 7 months ago

Hello.

While testing RFAA on a large protein-ligand complex structure, I received the following (I believe BLAST-related) error.

15:08:33.492 ERROR: In /opt/conda/conda-bld/hhsuite_1709621322429/work/src/hhalignment.cpp:223: Read:
15:08:33.492 ERROR:   sequence ss_pred contains no residues.

This seems related to an issue other users have reported before. However, in those cases, the issue was resolved by installing BLAST locally just like how the RF2 repository suggests to do so. Nonetheless, I have followed those RF2 instructions already by installing and setting up BLAST locally, and the setup seems to work for most inputs I give RFAA. It's just that every few inputs it seems to raise this error unpredictably.

Attached is one of such complexes for which RFAA produces this error on my end. BLAST_Error_Inputs.zip

cyangNYU commented 7 months ago

Haha, I got the same issue here, glad to know someone fixed it.

xie-yun-ai commented 7 months ago

Thank you very much, because I also encountered this problem and can't solve it.

Could you give us the specific configuration and path of blast

GanQiao1990 commented 7 months ago

And, in the[input_prep], the line of "$PIPE_DIR/csblast-2.2.3/bin/csbuild -i $i_a3m -I a3m -D $PIPE_DIR/csblast-2.2.3/data/K4000.crf -o $ID.chk -O chk" $PIPE_DIR" should be identify by the user.

pstansfeld commented 7 months ago

I was having a similar issue with PSIRED.

To overcome I downloaded the legacy blast data to the RoseTTAFold-All-Atom directory and made sure the path was exported in the make_ss.sh file.

wget https://ftp.ncbi.nlm.nih.gov/blast/executables/legacy.NOTSUPPORTED/2.2.26/blast-2.2.26-x64-linux.tar.gz

export BLASTMAT=~/RoseTTAFold-All-Atom/blast-2.2.26/data/

I also explicitly added paths to the PIPE_DIR paths in make_ss.sh.

Not sure if that will resolve your issue?

Now just waiting for the merge_inputs.py bug to be resolved :)

Thanks for setting up the mamba/conda version @amorehead.

HBioquant commented 7 months ago

When I run the bug example provided by @amorehead, I get a new error:

Error executing job with overrides: []
Traceback (most recent call last):
  File "/home/zhujt/RoseTTAFold-All-Atom/rf2aa/run_inference.py", line 206, in main                                  
    runner.infer()
  File "/home/zhujt/RoseTTAFold-All-Atom/rf2aa/run_inference.py", line 153, in infer                                 
    self.parse_inference_config()
  File "/home/zhujt/RoseTTAFold-All-Atom/rf2aa/run_inference.py", line 93, in parse_inference_config                 
    raw_data = merge_all(protein_inputs, na_inputs, sm_inputs, residues_to_atomize, deterministic=self.deterministic)              
  File "/home/zhujt/RoseTTAFold-All-Atom/rf2aa/data/merge_inputs.py", line 169, in merge_all                         
    protein_inputs, protein_chain_lengths = merge_protein_inputs(protein_inputs, deterministic=deterministic)                      
  File "/home/zhujt/RoseTTAFold-All-Atom/rf2aa/data/merge_inputs.py", line 48, in merge_protein_inputs               
    a3m_out = expand_multi_msa(a3m_out, unique_hashes, hash_list, unique_lengths_list, lengths_list)                               
  File "/home/zhujt/RoseTTAFold-All-Atom/rf2aa/data/data_loader_utils.py", line 786, in expand_multi_msa             
    assert(a3m['msa'].shape[1]==sum(Ls_in))
AssertionError
HBioquant commented 7 months ago

Oh, i have seen the PR. And the bug has been fixed.

Sue-Fwl commented 2 months ago

Still occurring.

ERROR: In /opt/conda/conda-bld/hhsuite_1709621322429/work/src/hhalignment.cpp:223: Read:

ERROR: sequence ss_pred contains no residues.