genomicsITER / NanoCLUST

NanoCLUST is an analysis pipeline for UMAP-based classification of amplicon-based full-length 16S rRNA nanopore reads
MIT License
106 stars 49 forks source link

Process read_correction is applied to a subset of reads from the fasta file #73

Open mszimmermann opened 2 years ago

mszimmermann commented 2 years ago

Hi, I managed to run the test run of NanoCLUST and got one classification result in the output file. Is this expected? Now I'm tryint to run it on my data and so far it looks the same. I noticed that the process read_correction is using canu to correct reads for a subset of the original file:

From main.nf: Line 325: head -n\$(( $count*4 )) $reads > subset.fastq Line 326: canu -correct -p corrected_reads -nanopore-raw subset.fastq genomeSize=${params.avg_amplicon_size} stopOnLowCoverage=1 minInputCoverage=2 minReadLength=500 minOverlapLength=200

And corrected_reads.corrected_reads.fastq contains about 50 sequences. Is it supposed to be like that, and of yes, why is only a subset of original reads used?