Closed zihengluo closed 1 year ago
Hi, that's interesting. Does your insert contain any repetitive sections or homopolymers? and what happens if you increase coverage with the param --assm_coverage
default is 60. Nothing has changed between the versions that would impact the assembly.
Oh also maybe change the param --approx_size
to match your data, its currently set to 7000 .. maybe try 10000
o maybe change the param
--approx_size
to match your data, its currently set to 7000 .. maybe try 10000
I'm having the same issue. I noticed that the number of reads differs significantly between the different versions (on the histogram), so I think some tools might have changed their default values. Maybe a cutoff is being passed somewhere by default. I dunno.
I saw these differences between latest version 0.2.12 and 0.2.8.
Thanks for pointing this out. We have changed the cut off for the data we report in the raw data qc tab, we will look in to either restoring this or updating the changelog to explain the change.
The actual assembly method has stayed the same so it shouldn't impact the output assemblies? Are you working with multiple barcodes? And if so are they all approximately the same size? - How different are your output assemblies between versions?
Thanks for pointing this out. We have changed the cut off for the data we report in the raw data qc tab, we will look in to either restoring this or updating the changelog to explain the change.
The actual assembly method has stayed the same so it shouldn't impact the output assemblies? Are you working with multiple barcodes? And if so are they all approximately the same size? - How different are your output assemblies between versions?
They're the same, it was just the plot.
Hi, We have since updated the workflow to restore the raw data gc plots - showing just the raw unfiltered data.
Closing as assuming issue resolved.
What happened?
Hi, I ran the workflow for 6 times (3 times with v0.2.1, three times with v0.2.4) on the same input data, the output 6 assemblies are different. I wonder what factors caused the inconsistency?
Here is the the command I used: nextflow run epi2me-labs/wf-clone-validation -profile conda --fastq {input_path} --db_directory wf-clone-validation-db --out_dir {output_path}
Here is the overview of input data :
I mapped the six output assemblies against the reference sequence and found the assemblies failed to fully recover the reference and lost different sequences in different runs.
However, the reference is fully covered by reads as indicated by reads mapping results. I used minimap2 without secondary mapping.
Operating System
ubuntu 20.04
Workflow Execution
Command line
Workflow Execution - EPI2ME Labs Versions
No response
Workflow Execution - Execution Profile
Conda
Workflow Version
0.2.1 & 0.2.4
Relevant log output