connor-lab / ncov2019-artic-nf

A Nextflow pipeline for running the ARTIC network's fieldbioinformatics tools (https://github.com/artic-network/fieldbioinformatics), with a focus on ncov2019
GNU Affero General Public License v3.0
89 stars 89 forks source link

Add minReadsPerBarcode parameter for Nanopore workflow #57

Closed kjsanger closed 4 years ago

kjsanger commented 4 years ago

The barcode<n*> directories of de-plexed Fastq input are filtered to exclude any containing fewer than 5 files (guppy makes all of the barcode directories, whether you used the barcodes or not).

This patch changes the filter to scan the files and count the records within to 1) avoid excluding barcodes where many reads are in fewer than 5 files and 2) avoid including barcodes where there are very few reads in more than 5 files.

This threshold can be changed with the new optional parameter minReadsPerBarcode, which defaults to 100 reads.

I expect that this default may not be appropriate. It's somewhat arbitrary as I'm not sure how many reads would be in those 5 files originally filtered (more than 100, I suspect).