chg60 / phammseqs

Assort phage protein sequences into phamilies using MMseqs2
GNU General Public License v3.0
8 stars 0 forks source link

mmseqs cluster error #1

Closed k6logc closed 8 months ago

k6logc commented 10 months ago

Dear Christian, I am looking forward to working with PhaMMseqs (and PhamClust) but have been running into the error below using the demo genes.faa file, would appreciate any guidance – thank you! Best, Kathryn

Here was my install:

conda create -n phammseqs-env python=3.9 -y && conda activate phammseqs-env
conda install -c bioconda -c conda-forge mmseqs2=13.45111 clustalo -y
pip3 install phammseqs

And the error:

(phammseqs-env) [kmkauffm@vortex2:/projects/academic/kmkauffm/kauffman/ZZ.daysDir/mangos/data_07_phammseqs_phamclust/04_new_test]$ phammseqs genes.faa
Failed to execute /tmp/phammseqs-zrq6crle/17595342956995715567/cascaded_clustering.sh with error 13.

Traceback (most recent call last):
  File "/projects/academic/kmkauffm/kauffman/00.mambaforge/envs/phammseqs-env/bin/phammseqs", line 8, in <module>
    sys.exit(main())
  File "/projects/academic/kmkauffm/kauffman/00.mambaforge/envs/phammseqs-env/lib/python3.9/site-packages/phammseqs/__main__.py", line 318, in main
    phams = assemble_phams(db=database, seq_params=seq_params,
  File "/projects/academic/kmkauffm/kauffman/00.mambaforge/envs/phammseqs-env/lib/python3.9/site-packages/phammseqs/__main__.py", line 183, in assemble_phams
    raise err
  File "/projects/academic/kmkauffm/kauffman/00.mambaforge/envs/phammseqs-env/lib/python3.9/site-packages/phammseqs/__main__.py", line 128, in assemble_phams
    mmseqs.cluster(seq_db, clu_db, tmp_dir, identity=i, coverage=c,
  File "/projects/academic/kmkauffm/kauffman/00.mambaforge/envs/phammseqs-env/lib/python3.9/site-packages/phammseqs/mmseqs.py", line 139, in cluster
    raise MMseqs2Error(f"command failed: {command}")
phammseqs.mmseqs.MMseqs2Error: command failed: mmseqs cluster /tmp/phammseqs-zrq6crle/sequenceDB /tmp/phammseqs-zrq6crle/seqClusterDB /tmp/phammseqs-zrq6crle --min-seq-id 0.35 -c 0.8 -e 0.001 -s 7 --max-seqs 1000 --cluster-mode 0 --cluster-steps 1 --alignment-mode 3 --cov-mode 0  --threads 32 -v 3  --cluster-reassign
chg60 commented 10 months ago

Hi Kathryn,

Thank you for your interest in phammseqs and phamclust, and for reporting this issue!

I have an open branch of phammseqs where I’m implementing a number of important updates and new features, so I’ll hope to incorporate a fix to your issue as well.

Can you please provide some additional context for the following:

  1. What is reported if you run ‘mmseqs version’ at the command line with the conda environment active?
  2. What version of mamba is that? (I have never used mamba, so maybe now is the time…)
  3. Is the device in question a personal computer, server, or compute node on a larger cluster? 32 threads could be anything these days! 🙂
  4. What is the operating system installed on the device in question?

Once you provide these answers I’ll do my best to reproduce the issue and investigate.

Best,

Christian

k6logc commented 10 months ago

Hi Christian, Great that you have an update in the works with new features, am interested to try that out once it's ready. Below are the version infos you asked about, and I am running this on a Linux cluster. Thank you very much for your help! Best, Kathryn

(phammseqs-env) [kmkauffm@vortex2:/projects/academic/kmkauffm/kauffman/00.mambaforge/bin]$ mmseqs version
13.45111
(phammseqs-env) [kmkauffm@vortex2:/projects/academic/kmkauffm/kauffman/00.mambaforge/bin]$ mamba -V
conda 23.1.0
chg60 commented 10 months ago

Thanks for the additional context. Is it possible that your account doesn’t have write permissions on /tmp since it’s a cluster? Doesn’t seem super likely based on where the error happened, but it’s an easy first guess…

Can you try running ‘touch /tmp/test.txt’ while logged into the node and report back whether it gave you a permission error?

Additionally, if you re-run the phammseqs command from earlier but add ‘-d’ it will run in debug mode and provide more information about what went wrong…

k6logc commented 10 months ago

When I submit the job (rather than running the test on the head node) it works fine! So, I'm in the clear :) Thanks so much for your quick help on this! _EDIT-1: Just to confirm, my final outputs are two folders (pham_aligns and phamfastas) - are those all the expected products? Is there supposed to also be the tsv file that serves as the input to phamclust, or does the user generate that on their own? EDIT-2: I see that I can get the tsv if I run the pangenome analysis, so I'm good there too now.

For completeness, regarding the errors when running the test on the head node:

(phammseqs-env) [kmkauffm@vortex1:/projects/academic/kmkauffm/kauffman/ZZ.daysDir/phammseqs]$ phammseqs genes.faa -d
Parsing protein sequences from input files...
Found 893 translations in 1 files...
Creating MMseqs2 database...
createdb /tmp/phammseqs-v3ycp3vm/nr_genes.fasta /tmp/phammseqs-v3ycp3vm/sequenceDB -v 3 

MMseqs Version:         13.45111
Database type           0
Shuffle input database  true
Createdb mode           0
Write lookup file       1
Offset of numeric ids   0
Compressed              0
Verbosity               3

Converting sequences
[
Time for merging to sequenceDB_h: 0h 0m 0s 4ms
Time for merging to sequenceDB: 0h 0m 0s 4ms
Database type: Aminoacid
Time for processing: 0h 0m 0s 23ms

Performing sequence-sequence clustering...
cluster /tmp/phammseqs-v3ycp3vm/sequenceDB /tmp/phammseqs-v3ycp3vm/seqClusterDB /tmp/phammseqs-v3ycp3vm --min-seq-id 0.35 -c 0.8 -e 0.001 -s 7 --max-seqs 1000 --cluster-mode 0 --cluster-steps 1 --alignment-mode 3 --cov-mode 0 --threads 32 -v 3 --cluster-reassign 

MMseqs Version:                         13.45111
Substitution matrix                     nucl:nucleotide.out,aa:blosum62.out
Seed substitution matrix                nucl:nucleotide.out,aa:VTML80.out
Sensitivity                             7
k-mer length                            0
k-score                                 2147483647
Alphabet size                           nucl:5,aa:21
Max sequence length                     65535
Max results per query                   1000
Split database                          0
Split mode                              2
Split memory limit                      0
Coverage threshold                      0.8
Coverage mode                           0
Compositional bias                      1
Diagonal scoring                        true
Exact k-mer matching                    0
Mask residues                           1
Mask lower case residues                0
Minimum diagonal score                  15
Include identical seq. id.              false
Spaced k-mers                           1
Preload mode                            0
Pseudo count a                          1
Pseudo count b                          1.5
Spaced k-mer pattern                
Local temporary path                
Threads                                 32
Compressed                              0
Verbosity                               3
Add backtrace                           false
Alignment mode                          3
Alignment mode                          0
Allow wrapped scoring                   false
E-value threshold                       0.001
Seq. id. threshold                      0.35
Min alignment length                    0
Seq. id. mode                           0
Alternative alignments                  0
Max reject                              2147483647
Max accept                              2147483647
Score bias                              0
Realign hits                            false
Realign score bias                      -0.2
Realign max seqs                        2147483647
Gap open cost                           nucl:5,aa:11
Gap extension cost                      nucl:2,aa:1
Zdrop                                   40
Rescore mode                            0
Remove hits by seq. id. and coverage    false
Sort results                            0
Cluster mode                            0
Max connected component depth           1000
Similarity type                         2
Single step clustering                  false
Cascaded clustering steps               1
Cluster reassign                        true
Remove temporary files                  false
Force restart with latest tmp           false
MPI runner                          
k-mers per sequence                     21
Scale k-mers per sequence               nucl:0.200,aa:0.000
Adjust k-mer length                     false
Shift hash                              67
Include only extendable                 false
Skip repeating k-mers                   false

Failed to execute /tmp/phammseqs-v3ycp3vm/8581765641845880585/cascaded_clustering.sh with error 13.

Temporary files moved to /user/kmkauffm/phammseqs-v3ycp3vm
Traceback (most recent call last):
  File "/projects/academic/kmkauffm/kauffman/00.mambaforge/envs/phammseqs-env/bin/phammseqs", line 8, in <module>
    sys.exit(main())
  File "/projects/academic/kmkauffm/kauffman/00.mambaforge/envs/phammseqs-env/lib/python3.9/site-packages/phammseqs/__main__.py", line 318, in main
    phams = assemble_phams(db=database, seq_params=seq_params,
  File "/projects/academic/kmkauffm/kauffman/00.mambaforge/envs/phammseqs-env/lib/python3.9/site-packages/phammseqs/__main__.py", line 183, in assemble_phams
    raise err
  File "/projects/academic/kmkauffm/kauffman/00.mambaforge/envs/phammseqs-env/lib/python3.9/site-packages/phammseqs/__main__.py", line 128, in assemble_phams
    mmseqs.cluster(seq_db, clu_db, tmp_dir, identity=i, coverage=c,
  File "/projects/academic/kmkauffm/kauffman/00.mambaforge/envs/phammseqs-env/lib/python3.9/site-packages/phammseqs/mmseqs.py", line 139, in cluster
    raise MMseqs2Error(f"command failed: {command}")
phammseqs.mmseqs.MMseqs2Error: command failed: mmseqs cluster /tmp/phammseqs-v3ycp3vm/sequenceDB /tmp/phammseqs-v3ycp3vm/seqClusterDB /tmp/phammseqs-v3ycp3vm --min-seq-id 0.35 -c 0.8 -e 0.001 -s 7 --max-seqs 1000 --cluster-mode 0 --cluster-steps 1 --alignment-mode 3 --cov-mode 0  --threads 32 -v 3  --cluster-reassign
chg60 commented 8 months ago

Finally getting around to closing this issue. User experienced issues when running on login node of HTC cluster, presumably due to a cap on process CPU utilization on login node.