Carrion-lab / bacLIFE

23 stars 3 forks source link

Error in rule clustering #7

Open vishnukumar200102 opened 4 months ago

vishnukumar200102 commented 4 months ago

(bacLIFE_environment) jr@jr-HP-Z220-CMT-Workstation:~/bacLIFE$ snakemake -j 24 Building DAG of jobs... Using shell: /usr/bin/bash Provided cores: 24 Rules claiming more threads will be scaled down. Job stats: job count min threads max threads


EGGNOG 1 1 1 KEGG_COG 1 1 1 clustering 1 1 1 dbCAN 1 1 1 final 1 1 1 pfam 1 1 1 process_annotations 1 1 1 process_hmm_annotations 1 1 1 rename_MEGAMATRIX 1 1 1 total 9 1 1

Select jobs to execute...

[Thu Apr 4 12:09:34 2024] rule clustering: input: intermediate_files/combined_proteins/combined_proteins.fasta output: intermediate_files/clustering/binary_matrix.txt, intermediate_files/clustering/protein_cluster jobid: 22 resources: tmpdir=/tmp

intermediate_files/clustering/mmseqDB exists and will be overwritten createdb intermediate_files/combined_proteins/combined_proteins.fasta intermediate_files/clustering/mmseqDB

MMseqs Version: 13-45111+ds-2 Database type 0 Shuffle input database true Createdb mode 0 Write lookup file 1 Offset of numeric ids 0 Compressed 0 Verbosity 3

Converting sequences [24140] 0s 45ms Time for merging to mmseqDB_h: 0h 0m 0s 5ms Time for merging to mmseqDB: 0h 0m 0s 13ms Database type: Aminoacid Time for processing: 0h 0m 0s 98ms cluster intermediate_files/clustering/mmseqDB intermediate_files/clustering/mmseqDB_clu intermediate_files/clustering/mmseqDB_temp --min-seq-id 0.95 --cov-mode 0 -c 0.8

MMseqs Version: 13-45111+ds-2 Substitution matrix nucl:nucleotide.out,aa:blosum62.out Seed substitution matrix nucl:nucleotide.out,aa:VTML80.out Sensitivity 4 k-mer length 0 k-score 2147483647 Alphabet size nucl:5,aa:21 Max sequence length 65535 Max results per query 20 Split database 0 Split mode 2 Split memory limit 0 Coverage threshold 0.8 Coverage mode 0 Compositional bias 1 Diagonal scoring true Exact k-mer matching 0 Mask residues 1 Mask lower case residues 0 Minimum diagonal score 15 Include identical seq. id. false Spaced k-mers 1 Preload mode 0 Pseudo count a 1 Pseudo count b 1.5 Spaced k-mer pattern
Local temporary path
Threads 8 Compressed 0 Verbosity 3 Add backtrace false Alignment mode 3 Alignment mode 0 Allow wrapped scoring false E-value threshold 0.001 Seq. id. threshold 0.95 Min alignment length 0 Seq. id. mode 0 Alternative alignments 0 Max reject 2147483647 Max accept 2147483647 Score bias 0 Realign hits false Realign score bias -0.2 Realign max seqs 2147483647 Gap open cost nucl:5,aa:11 Gap extension cost nucl:2,aa:1 Zdrop 40 Rescore mode 0 Remove hits by seq. id. and coverage false Sort results 0 Cluster mode 0 Max connected component depth 1000 Similarity type 2 Single step clustering false Cascaded clustering steps 3 Cluster reassign false Remove temporary files false Force restart with latest tmp false MPI runner
k-mers per sequence 21 Scale k-mers per sequence nucl:0.200,aa:0.000 Adjust k-mer length false Shift hash 67 Include only extendable false Skip repeating k-mers false

Set cluster sensitivity to -s 1.000000 Set cluster mode SET COVER Set cluster iterations to 1 intermediate_files/clustering/mmseqDB_clu.dbtype exists already! [Thu Apr 4 12:09:35 2024] Error in rule clustering: jobid: 0 output: intermediate_files/clustering/binary_matrix.txt, intermediate_files/clustering/protein_cluster

RuleException: CalledProcessError in line 159 of /home/jr/bacLIFE/Snakefile: Command 'set -euo pipefail; mmseqs cluster intermediate_files/clustering/mmseqDB intermediate_files/clustering/mmseqDB_clu intermediate_files/clustering/mmseqDB_temp --min-seq-id 0.95 --cov-mode 0 -c 0.8' returned non-zero exit status 1. File "/home/jr/bacLIFE/Snakefile", line 159, in __rule_clustering File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run Exiting because a job execution failed. Look above for error message Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: /home/jr/bacLIFE/.snakemake/log/2024-04-04T120933.621550.snakemake.log 2024-04-04T120933.621550.snakemake.log

gguerr001 commented 4 months ago

I see mmseq2 gives an error because there are some databases that are already created from a prerun. As quick workaround, you can delete the whole folder 'intermediate_files/clustering/' and rerun the snakemake. This will create the mmseq2 database again and avoid the error.

vishnukumar200102 commented 4 months ago

2024-04-05T111750.295898.snakemake.log same error after deleting clustering file

gguerr001 commented 3 months ago

I can not see any error in the last log