dbeisser / Natrix2

Open-source bioinformatics pipeline for the preprocessing of raw amplicon sequencing / metabarcoding data.
MIT License
9 stars 2 forks source link

Error in cluster sorting #22

Open dmgr90 opened 3 weeks ago

dmgr90 commented 3 weeks ago

Hi again,

I have been getting the following error, and I haven found a solution. Among the things I tried is running the command with the -ignore-incomplete flag, to no avail.

The error is the following:

Error in rule cluster_sorting: jobid: 1829 output: Brazil_12s/assembly/VW1A1O3_A/VW1A1O3_A.dereplicated.fasta conda-env: /ictstr01/home/haicu/daniel.gygax/Brazil/Natrix2/.snakemake/conda/2afa7833

Removing temporary output file Brazil_12s/assembly/VA2BI3_A/VA2BI3_A_cdhit.fasta.clstr. [Wed Aug 28 13:30:22 2024] Finished job 689. 55 of 369 steps (15%) done Select jobs to execute... RuleException: CalledProcessError in line 32 of /ictstr01/home/haicu/daniel.gygax/Brazil/Natrix2/rules/dereplication.smk: Command 'source /home/haicu/daniel.gygax/miniforge3/bin/activate '/ictstr01/home/haicu/daniel.gygax/Brazil/Natrix2/.snakemake/conda/2afa7833'; set -euo pipefail; python /ictstr01/home/haicu/daniel.gygax/Brazil/Natrix2/.snakemake/scripts/tmpka_mbvmq.dereplication.py' returned non-zero exit status 1. File "/home/haicu/daniel.gygax/miniforge3/envs/natrix/lib/python3.7/site-packages/snakemake/executors/init.py", line 2347, in run_wrapper File "/ictstr01/home/haicu/daniel.gygax/Brazil/Natrix2/rules/dereplication.smk", line 32, in rule_cluster_sorting File "/home/haicu/daniel.gygax/miniforge3/envs/natrix/lib/python3.7/site-packages/snakemake/executors/init__.py", line 568, in _callback File "/home/haicu/daniel.gygax/miniforge3/envs/natrix/lib/python3.7/concurrent/futures/thread.py", line 57, in run File "/home/haicu/daniel.gygax/miniforge3/envs/natrix/lib/python3.7/site-packages/snakemake/executors/init.py", line 554, in cached_or_run File "/home/haicu/daniel.gygax/miniforge3/envs/natrix/lib/python3.7/site-packages/snakemake/executors/init.py", line 2359, in run_wrapper

The medaka consensus.fasta files (wgich I believe are the input for this step) for the samples causing the error are empty. Is there any way around this. It takes about 1 hour when rerunning until the error shows up.

Once more, thank you in advance for your support

Find the logs attached.

2024-08-28T125011.440909.snakemake.log 2024-08-28T141832.202922.snakemake.log 2024-08-28T155506.836311.snakemake.log

dusti1n commented 3 weeks ago

Hi @dmgr90,

Thanks for sharing the logs and the details. I will take a closer look at the issue. Thank you for your patience!

Best, Dustin

dmgr90 commented 3 weeks ago

Hi Dustin,

Thank you very much for looking into it. I managed to go around this by removing the empty medaka consensus.fasta and rerunning, in case you are interested.

dusti1n commented 3 weeks ago

Hi Dustin,

Thank you very much for looking into it. I managed to go around this by removing the empty medaka consensus.fasta and rerunning, in case you are interested.

Great, thanks for the additional information. I will definitely have a look at the problem anyway. Have a nice day.

Best, Dustin

dusti1n commented 2 weeks ago

Hello @dmgr90,

You mentioned that you manually deleted the empty consensus.fasta files, and then the pipeline ran successfully. After running the pipeline again, could you check if your consensus.fasta files are still empty?

It might also be helpful to look at the files created in the previous steps, as the cluster_sorting rule depends on your consensus.fasta files.

Please check the Racon files in your output folder under the following path: (*output_folder/read_correction/racon/_racon_x.fasta*). The x is a placeholder for a number. Look at the FASTA file in the folder with the highest number. For example, if Racon is set to 4 in the configuration file, the path would be: output_folder/read_correction/racon/_racon_4.fasta.

This is currently the default value in the Nanopore.yaml configuration file.

It's quite possible that errors are generated by empty files in the previous processing steps. Another option is to run the workflow with test data to see if it works without issues. Sometimes errors can also be caused by specific samples.

If you have any further questions, please feel free to contact us. We will be happy to help you!

Best, Dustin