checkIfEnoughReads too stringent

angelovangel commented 6 months ago

Operating System

Ubuntu 22.04

Other Linux

No response

Workflow Version

v1.2.0-g2c04b9d

Workflow Execution

Command line

EPI2ME Version

No response

CLI command run

nextflow run epi2me-labs/wf-clone-validation --fastq fastq_pass --sample_sheet samplesheet.csv --threads 16

Workflow Execution - CLI Execution Profile

standard (default)

What happened?

The fastcat filtering step in checkIfEnoughReads is too agressive, especially for bigger plasmids it leaves no reads. For a 14 kb plasmid, the filtering logic is: fastcat -s sample1.interim -a 7000 -b 21000 input.fastq.gz | bgzip -@ 15 > interim.fastq.gz Although the sequence data is good, it leaves 3 reads out of 2000 and fails to assemble. Does it make sense to even have this step? I guess the assembly will work with much shorter reads.

https://github.com/epi2me-labs/wf-clone-validation/blob/2c04b9d884b7715905e6ea45a77a07f3870234ac/main.nf#L37

Relevant log output

STATUS="Failed due to insufficient reads"

Application activity log entry

No response

Were you able to successfully run the latest version of the workflow with the demo data?

yes

Other demo data information

No response

sarahjeeeze commented 6 months ago

Hi, Is the approx_size set correctly for the data. It filters for 0.5x-1.5x the approx size so doesnt seem that stringent. You could use the --large_construct parameter which will effectively skip this step and consider all your reads for the assembly.

angelovangel commented 6 months ago

Thanks, --large_construct solves my issues..

epi2me-labs / wf-clone-validation