Carrion-lab / bacLIFE

32 stars 5 forks source link

Error in rule antismash #19

Open margarett opened 1 month ago

margarett commented 1 month ago

I've seen the previous issue #15 but I can't fix my issue even with --j 1

I am running 5 files for analysis (I don't know what I'm actually analyzing, just helping a friend who has little clue about Linux and command line...) and I always get this error:

Activating conda environment: antismash_bacLIFE
[Thu Oct  3 21:39:36 2024]
Error in rule antismash:
    jobid: 24
    input: intermediate_files/annot/rn6390_X00005_O/Saureus_rn6390_X00005_O.gbk
    output: intermediate_files/antismash/Saureus_rn6390_X00005_O/Saureus_rn6390_X00005_O.gbk
    conda-env: antismash_bacLIFE
    shell:
        antismash --cb-general --cb-knownclusters --cb-subclusters --output-dir intermediate_files/antismash/Saureus_rn6390_X00005_O/ --asf --pfam2go --genefinding-tool prodigal --smcog-trees intermediate_files/annot/rn6390_X00005_O/Saureus_rn6390_X00005_O.gbk
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message

antismash is installed in environment antismash_bacLIFE and the folder "intermediate_files/antismash/Saureus_rn6390_X00005_O" is temporarily created (and removed after the error).

I am now trying to run a single file at a time. Also edited the Snakemake file to add --debug to the shell parameter.

margarett commented 1 month ago

I ran all files one by one, and it all fails at the same point. Here is the log

Set cluster sensitivity to -s 1.000000
Set cluster mode SET COVER
Set cluster iterations to 1
intermediate_files/clustering/mmseqDB_clu.dbtype exists already!
RuleException:
CalledProcessError in file /home/bruno/bacLIFE/Snakefile, line 158:
Command 'set -euo pipefail;  mmseqs cluster intermediate_files/clustering/mmseqDB intermediate_files/clustering/mmseqDB_clu intermediate_files/clustering/mmseqDB_temp --min-seq-id 0.95 --cov-mode 0 -c 0.8' returned non-zero exit status 1.
[Thu Oct  3 23:58:31 2024]
Error in rule clustering:
    jobid: 0
    input: intermediate_files/combined_proteins/combined_proteins.fasta
    output: intermediate_files/clustering/binary_matrix.txt, intermediate_files/clustering/protein_cluster

Exiting because a job execution failed. Look above for error message
WorkflowError:
At least one job did not complete successfully.
[Thu Oct  3 23:58:31 2024]
Error in rule clustering:
    jobid: 6
    input: intermediate_files/combined_proteins/combined_proteins.fasta
    output: intermediate_files/clustering/binary_matrix.txt, intermediate_files/clustering/protein_cluster

Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
gguerr001 commented 1 month ago

Hi,

Does antismash fail to run when running separetly within the antismash_bacLIFE environment? If that is the case is a problem of antismash, try to reinstall this conda env using conda instead of mamba helped me fix a similar issue I had.

gguerr001 commented 1 month ago

The second error is a different issue. Remove the intermediate_files/clustering folder. Sometimes some already created files from mmseq2 make the pipeline stop

margarett commented 1 month ago

hi @gguerr001 and thank you for your reply. I'm sorry it took so long to test it.

I restarted my WSL and redo the whole process, this time creating all environment with conda instead of mamba. (just a side note, bacLIFE downloads an insane amount of data before one can start testing it...) Unfortunately I got a similar (not exactly the same) error

(bacLIFE_environment) bruno@DESKTOP-ISHAQIB:~/bacLIFE$ Rscript src/rename_genomes.R data/ names_equivalence.txt
[1] TRUE TRUE TRUE TRUE TRUE
(bacLIFE_environment) bruno@DESKTOP-ISHAQIB:~/bacLIFE$ ls
CITATION.cff  README.md  Snakefile        classifier_src  data      download  intermediate_files     src
ENVS          Shiny_app  app_example.zip  config.json     data_ori  images    names_equivalence.txt
(bacLIFE_environment) bruno@DESKTOP-ISHAQIB:~/bacLIFE$ snakemake -j 2 --use-conda
Assuming unrestricted shared filesystem usage.
host: DESKTOP-ISHAQIB
(...  removed  ...)
[Wed Oct 16 21:55:33 2024]
localrule directories:
(...  removed, to get to the error and notice how long it's been running ...)

[Wed Oct 16 23:05:24 2024]
localrule antismash:
    input: intermediate_files/annot/cbs2016-05_X00001_O/Saureus_cbs2016-05_X00001_O.gbk
    output: intermediate_files/antismash/Saureus_cbs2016-05_X00001_O/Saureus_cbs2016-05_X00001_O.gbk
    jobid: 26
    reason: Missing output files: intermediate_files/antismash/Saureus_cbs2016-05_X00001_O/Saureus_cbs2016-05_X00001_O.gbk; Input files updated by another job: intermediate_files/annot/cbs2016-05_X00001_O/Saureus_cbs2016-05_X00001_O.gbk
    wildcards: genus=Saureus, species=cbs2016-05, str=X00001, replicon=O
    resources: tmpdir=/tmp

Activating conda environment: antismash_bacLIFE
[Wed Oct 16 23:12:53 2024]
Error in rule antismash:
    jobid: 26
    input: intermediate_files/annot/cbs2016-05_X00001_O/Saureus_cbs2016-05_X00001_O.gbk
    output: intermediate_files/antismash/Saureus_cbs2016-05_X00001_O/Saureus_cbs2016-05_X00001_O.gbk
    conda-env: antismash_bacLIFE
    shell:
        antismash --cb-general --cb-knownclusters --cb-subclusters --output-dir intermediate_files/antismash/Saureus_cbs2016-05_X00001_O/ --asf --pfam2go --genefinding-tool prodigal --smcog-trees intermediate_files/annot/cbs2016-05_X00001_O/Saureus_cbs2016-05_X00001_O.gbk
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

However there's now a difference. The process was "hanged" at this point, it didn't drop to command line immediatelly as it did before, there's not the message "Shutting down it may take some time", the cursor was still blinking. I could tell by my task manager that WSL was still working hard, so I waited. But unfortunately after ~~10 minutes it actually failed

[Wed Oct 16 23:22:47 2024]
Finished job 22.
12 of 31 steps (39%) done
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2024-10-16T215533.893112.snakemake.log
WorkflowError:
At least one job did not complete successfully.

Does antismash fail to run when running separetly within the antismash_bacLIFE environment? I tried to do this (not sure if it's supposed to work outside the script)

(bacLIFE_environment) bruno@DESKTOP-ISHAQIB:~/bacLIFE$ conda deactivate
(base) bruno@DESKTOP-ISHAQIB:~/bacLIFE$ conda activate antismash_bacLIFE
(antismash_bacLIFE) bruno@DESKTOP-ISHAQIB:~/bacLIFE$ antismash --cb-general --cb-knownclusters --cb-subclusters --output-dir intermediate_files/antismash/Saureus_cbs2016-05_X00001_O/ --asf --pfam2go --genefinding-tool prodigal --smcog-trees intermediate_files/annot/cbs2016-05_X00001_O/Saureus_cbs2016-05_X00001_O.gbk

It "hanged" again for several minutes (cursor blinking nothing happening but CPU was working hard). After about ~~12-14min the command line appeared again without errors so I guess it worked fine?

margarett commented 1 month ago

I'm separating the comments for clarity (different issues) After the previous success I restarted the process with snakemake

The process was clearly different was went through the "antismash" job quickly (I could see it briefly) and started to show several messages in very quick progression (unlike before where it was all very very slow). Unfortunately it threw again an error

Total time = 4.517s
Reported 13924 pairwise alignments, 13924 HSPs.
3699 queries aligned.
RuleException:
CalledProcessError in file /home/bruno/bacLIFE/Snakefile, line 173:
Command 'set -euo pipefail;  grep "^>" intermediate_files/clustering/unaligned.fasta  > intermediate_files/clustering/unaligned_headers.txt' returned non-zero exit status 1.
[Wed Oct 16 23:38:30 2024]
Error in rule clustering:
    jobid: 0
    input: intermediate_files/combined_proteins/combined_proteins.fasta
    output: intermediate_files/clustering/binary_matrix.txt, intermediate_files/clustering/protein_cluster

Exiting because a job execution failed. Look above for error message
WorkflowError:
At least one job did not complete successfully.
[Wed Oct 16 23:38:30 2024]
Error in rule clustering:
    jobid: 14
    input: intermediate_files/combined_proteins/combined_proteins.fasta
    output: intermediate_files/clustering/binary_matrix.txt, intermediate_files/clustering/protein_cluster

The process was again "hanged" for several minutes, task manager still showing intense activity (not as high as before) and finally after ~~11min

[Wed Oct 16 23:49:47 2024]
Finished job 23.
5 of 18 steps (28%) done
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2024-10-16T233757.225931.snakemake.log
WorkflowError:
At least one job did not complete successfully.

I did check line 173 in Snakefile but I have no idea where to go from there. Restart the process bring the error I posted in my second comment, so I deleted the folder as suggested, restarted again and another different error popped (now related to "bigscape" environment...

edit: ah, turns out this is issue #13 Well, I can now confirm that to delete the "intermediate_files/clustering" folder and restart the process is not really a solution and it throws several other errors. Maybe some individual files can be deleted?

Kanomble commented 2 weeks ago

Hi I receive the same error of the antismash rule when running snakemake. Running snakemake within the antismash_bacLIFE environment is not possible, as snakemake is not part of the antismash_bacLIFE environment. Antismash however is loaded in the anismash_bacLIFE environment. It seems, that snakemake is not using the correct environment. Any solutions?