Closed k6logc closed 8 months ago
Hi Kathryn,
Thank you for your interest in phammseqs and phamclust, and for reporting this issue!
I have an open branch of phammseqs where I’m implementing a number of important updates and new features, so I’ll hope to incorporate a fix to your issue as well.
Can you please provide some additional context for the following:
Once you provide these answers I’ll do my best to reproduce the issue and investigate.
Best,
Christian
Hi Christian, Great that you have an update in the works with new features, am interested to try that out once it's ready. Below are the version infos you asked about, and I am running this on a Linux cluster. Thank you very much for your help! Best, Kathryn
(phammseqs-env) [kmkauffm@vortex2:/projects/academic/kmkauffm/kauffman/00.mambaforge/bin]$ mmseqs version
13.45111
(phammseqs-env) [kmkauffm@vortex2:/projects/academic/kmkauffm/kauffman/00.mambaforge/bin]$ mamba -V
conda 23.1.0
Thanks for the additional context. Is it possible that your account doesn’t have write permissions on /tmp since it’s a cluster? Doesn’t seem super likely based on where the error happened, but it’s an easy first guess…
Can you try running ‘touch /tmp/test.txt’ while logged into the node and report back whether it gave you a permission error?
Additionally, if you re-run the phammseqs command from earlier but add ‘-d’ it will run in debug mode and provide more information about what went wrong…
When I submit the job (rather than running the test on the head node) it works fine! So, I'm in the clear :) Thanks so much for your quick help on this! _EDIT-1: Just to confirm, my final outputs are two folders (pham_aligns and phamfastas) - are those all the expected products? Is there supposed to also be the tsv file that serves as the input to phamclust, or does the user generate that on their own? EDIT-2: I see that I can get the tsv if I run the pangenome analysis, so I'm good there too now.
For completeness, regarding the errors when running the test on the head node:
(phammseqs-env) [kmkauffm@vortex1:/projects/academic/kmkauffm/kauffman/ZZ.daysDir/phammseqs]$ phammseqs genes.faa -d
Parsing protein sequences from input files...
Found 893 translations in 1 files...
Creating MMseqs2 database...
createdb /tmp/phammseqs-v3ycp3vm/nr_genes.fasta /tmp/phammseqs-v3ycp3vm/sequenceDB -v 3
MMseqs Version: 13.45111
Database type 0
Shuffle input database true
Createdb mode 0
Write lookup file 1
Offset of numeric ids 0
Compressed 0
Verbosity 3
Converting sequences
[
Time for merging to sequenceDB_h: 0h 0m 0s 4ms
Time for merging to sequenceDB: 0h 0m 0s 4ms
Database type: Aminoacid
Time for processing: 0h 0m 0s 23ms
Performing sequence-sequence clustering...
cluster /tmp/phammseqs-v3ycp3vm/sequenceDB /tmp/phammseqs-v3ycp3vm/seqClusterDB /tmp/phammseqs-v3ycp3vm --min-seq-id 0.35 -c 0.8 -e 0.001 -s 7 --max-seqs 1000 --cluster-mode 0 --cluster-steps 1 --alignment-mode 3 --cov-mode 0 --threads 32 -v 3 --cluster-reassign
MMseqs Version: 13.45111
Substitution matrix nucl:nucleotide.out,aa:blosum62.out
Seed substitution matrix nucl:nucleotide.out,aa:VTML80.out
Sensitivity 7
k-mer length 0
k-score 2147483647
Alphabet size nucl:5,aa:21
Max sequence length 65535
Max results per query 1000
Split database 0
Split mode 2
Split memory limit 0
Coverage threshold 0.8
Coverage mode 0
Compositional bias 1
Diagonal scoring true
Exact k-mer matching 0
Mask residues 1
Mask lower case residues 0
Minimum diagonal score 15
Include identical seq. id. false
Spaced k-mers 1
Preload mode 0
Pseudo count a 1
Pseudo count b 1.5
Spaced k-mer pattern
Local temporary path
Threads 32
Compressed 0
Verbosity 3
Add backtrace false
Alignment mode 3
Alignment mode 0
Allow wrapped scoring false
E-value threshold 0.001
Seq. id. threshold 0.35
Min alignment length 0
Seq. id. mode 0
Alternative alignments 0
Max reject 2147483647
Max accept 2147483647
Score bias 0
Realign hits false
Realign score bias -0.2
Realign max seqs 2147483647
Gap open cost nucl:5,aa:11
Gap extension cost nucl:2,aa:1
Zdrop 40
Rescore mode 0
Remove hits by seq. id. and coverage false
Sort results 0
Cluster mode 0
Max connected component depth 1000
Similarity type 2
Single step clustering false
Cascaded clustering steps 1
Cluster reassign true
Remove temporary files false
Force restart with latest tmp false
MPI runner
k-mers per sequence 21
Scale k-mers per sequence nucl:0.200,aa:0.000
Adjust k-mer length false
Shift hash 67
Include only extendable false
Skip repeating k-mers false
Failed to execute /tmp/phammseqs-v3ycp3vm/8581765641845880585/cascaded_clustering.sh with error 13.
Temporary files moved to /user/kmkauffm/phammseqs-v3ycp3vm
Traceback (most recent call last):
File "/projects/academic/kmkauffm/kauffman/00.mambaforge/envs/phammseqs-env/bin/phammseqs", line 8, in <module>
sys.exit(main())
File "/projects/academic/kmkauffm/kauffman/00.mambaforge/envs/phammseqs-env/lib/python3.9/site-packages/phammseqs/__main__.py", line 318, in main
phams = assemble_phams(db=database, seq_params=seq_params,
File "/projects/academic/kmkauffm/kauffman/00.mambaforge/envs/phammseqs-env/lib/python3.9/site-packages/phammseqs/__main__.py", line 183, in assemble_phams
raise err
File "/projects/academic/kmkauffm/kauffman/00.mambaforge/envs/phammseqs-env/lib/python3.9/site-packages/phammseqs/__main__.py", line 128, in assemble_phams
mmseqs.cluster(seq_db, clu_db, tmp_dir, identity=i, coverage=c,
File "/projects/academic/kmkauffm/kauffman/00.mambaforge/envs/phammseqs-env/lib/python3.9/site-packages/phammseqs/mmseqs.py", line 139, in cluster
raise MMseqs2Error(f"command failed: {command}")
phammseqs.mmseqs.MMseqs2Error: command failed: mmseqs cluster /tmp/phammseqs-v3ycp3vm/sequenceDB /tmp/phammseqs-v3ycp3vm/seqClusterDB /tmp/phammseqs-v3ycp3vm --min-seq-id 0.35 -c 0.8 -e 0.001 -s 7 --max-seqs 1000 --cluster-mode 0 --cluster-steps 1 --alignment-mode 3 --cov-mode 0 --threads 32 -v 3 --cluster-reassign
Finally getting around to closing this issue. User experienced issues when running on login node of HTC cluster, presumably due to a cap on process CPU utilization on login node.
Dear Christian, I am looking forward to working with PhaMMseqs (and PhamClust) but have been running into the error below using the demo genes.faa file, would appreciate any guidance – thank you! Best, Kathryn
Here was my install:
And the error: