dbeisser / Natrix2

Open-source bioinformatics pipeline for the preprocessing of raw amplicon sequencing / metabarcoding data.
MIT License
11 stars 2 forks source link

error in cluster_sorting #25

Open dmgr90 opened 2 weeks ago

dmgr90 commented 2 weeks ago

Hi,

With a new nanopore dataset i have been running with an error in the cluster_sorting step. Attached is the log for reference. 2024-10-31T111904.682386.snakemake.log

I have checked the input files. The L2COA212S_A_cdhit.fasta L2COA212S_A_cdhit.fasta.clstr. I traced back and the consensus.fasta looks fine. The one that is empty is the rep_consensus.fasta. Everything else including the original fastq.gz files look normal and with a decent number of reads.

Do you have any clue where the problem might be coming from?

Thank you very much in advance

dusti1n commented 2 weeks ago

Hi @dmgr90,

I looked into your issue, and it seems that the empty rep_consensus.fasta file is likely the cause. This file is important because it indirectly affects the cluster_sorting rule, as it contributes to the data used in the clustering process.

You might try adjusting some parameters in the configuration file, especially the quality, length, and clustering filters. These settings control which reads are included in the consensus generation and could help ensure rep_consensus.fasta is populated.

Also, check the previous pipeline steps and logs to confirm they ran without errors. After adjusting the parameters, try re-running the affected steps.

To make sure the pipeline is working correctly overall, you might also consider running it with test data to verify that everything functions as expected.

Hope this helps! Let me know if you have any other questions.

Best regards,
dustin