Closed angelovangel closed 6 months ago
Hi, Is the approx_size set correctly for the data. It filters for 0.5x-1.5x the approx size so doesnt seem that stringent. You could use the --large_construct
parameter which will effectively skip this step and consider all your reads for the assembly.
Thanks, --large_construct
solves my issues..
Operating System
Ubuntu 22.04
Other Linux
No response
Workflow Version
v1.2.0-g2c04b9d
Workflow Execution
Command line
EPI2ME Version
No response
CLI command run
nextflow run epi2me-labs/wf-clone-validation --fastq fastq_pass --sample_sheet samplesheet.csv --threads 16
Workflow Execution - CLI Execution Profile
standard (default)
What happened?
The
fastcat
filtering step incheckIfEnoughReads
is too agressive, especially for bigger plasmids it leaves no reads. For a 14 kb plasmid, the filtering logic is:fastcat -s sample1.interim -a 7000 -b 21000 input.fastq.gz | bgzip -@ 15 > interim.fastq.gz
Although the sequence data is good, it leaves 3 reads out of 2000 and fails to assemble. Does it make sense to even have this step? I guess the assembly will work with much shorter reads.https://github.com/epi2me-labs/wf-clone-validation/blob/2c04b9d884b7715905e6ea45a77a07f3870234ac/main.nf#L37
Relevant log output
Application activity log entry
No response
Were you able to successfully run the latest version of the workflow with the demo data?
yes
Other demo data information
No response