Assembly Result Shows Shorter Length than Original Plasmid Despite High-Quality Read Distribution

FankangMeng commented 11 months ago

Title: Assembly Result Shows Shorter Length than Original Plasmid Despite High-Quality Read Distribution

Description: I've been attempting de novo assembly of a 13kb synthetic plasmid using EPi2ME. However, with each attempt, I consistently obtain a result that is significantly shorter in length than the original plasmid, even though the read length distribution seems to be of high quality. I've attached my logs and report for reference.

Interestingly, when I deactivate the 'deconcatenate' step in deconcatenate.py, the assembly result appears correct. However, I'm concerned that disabling this step could negatively affect the assembly of smaller plasmids.

Does anyone have a solution or suggestions for this issue? I'd greatly appreciate any help!

N E X T F L O W ~ version 23.04.1 Launching `/Users/mengfankang/epi2melabs/workflows/epi2me-labs/wf-clone-validation/main.nf` [thirsty_lichterman] DSL2 - revision: 7a2123c22d wf-clone-validation v0.4.0

Core Nextflow options runName : thirsty_lichterman containerEngine: docker launchDir : /Users/mengfankang/epi2melabs/instances/wf-clone-validation_e76845ad-d386-4315-8008-c15d5392ee08 workDir : /Users/mengfankang/epi2melabs/instances/wf-clone-validation_e76845ad-d386-4315-8008-c15d5392ee08/work projectDir : /Users/mengfankang/epi2melabs/workflows/epi2me-labs/wf-clone-validation userName : mengfankang profile : standard configFiles : /Users/mengfankang/epi2melabs/workflows/epi2me-labs/wf-clone-validation/nextflow.config Input Options fastq : /Users/mengfankang/Downloads/barcode24/barcode24.fastq approx_size : 13716 primers : /Users/mengfankang/epi2melabs/workflows/epi2me-labs/wf-clone-validation/data/primers.tsv Sample Options min_barcode : 0 max_barcode : 192 Output Options out_dir : /Users/mengfankang/epi2melabs/instances/wf-clone-validation_e76845ad-d386-4315-8008-c15d5392ee08/output !! Only displaying parameters that differ from the pipeline defaults !!

If you use epi2me-labs/wf-clone-validation for your analysis please cite:

The nf-core framework https://doi.org/10.1038/s41587-020-0439-x

This is epi2me-labs/wf-clone-validation v0.4.0.

Checking fastq input. [39/96d3aa] Submitted process > fastcat (1) [fb/cc74a8] Submitted process > pipeline:getParams [01/b7cb52] Submitted process > pipeline:medakaVersion [32/75ed2e] Submitted process > pipeline:lookup_medaka_model (1) [a1/ca1d22] Submitted process > pipeline:getVersions [61/810eed] Submitted process > pipeline:checkIfEnoughReads (1) [9d/09a816] Submitted process > pipeline:assembleCore (1) [6e/2a7db7] Submitted process > pipeline:downsampledStats (1) [0e/86786b] Submitted process > pipeline:medakaPolishAssembly (1) [04/21c67c] Submitted process > output (1) [6b/06db2d] Submitted process > pipeline:assembly_qc (1) [1d/7c06ab] Submitted process > pipeline:runPlannotate (1) [a9/5ab26f] Submitted process > pipeline:findPrimers (1) [38/4147cb] Submitted process > pipeline:inserts [ec/199a86] Submitted process > pipeline:report (1) [a9/f55238] Submitted process > output (2) [74/f5add7] Submitted process > output (3) [04/4cfc4d] Submitted process > output (5) [f9/3ec40e] Submitted process > output (4) [b6/4d754c] Submitted process > output (7) [31/9ab3d6] Submitted process > output (6)

FankangMeng commented 11 months ago

sample.fastq.gz Here I attached the fastq file here in case someone wants to have a test on it. Thank you!

sarahjeeeze commented 10 months ago

Hi, Thanks for sharing this. We will look in to this and see if it makes sense to parameterise the deconcat step.

sarahjeeeze commented 10 months ago

Hi, I ran the workflow 6 times with various parameters and seemed to get the 13,000bp assembly each time, apart from when I put assembly size to 14000bp and oddly seemed to get a 26000bp assembly (4x). What OS are you running it on? and did you provide me all of your data because my raw no. reads was only 1495? We will investigate further as we have had similiar reports from other users.

FankangMeng commented 10 months ago

Hi, thanks for your help. I am using an Ubuntu 22.04.3 LTS system. The data I provided was simply sampled to 1500 reads or so using Seqkit to reduce the file size (Github upload limit is 25Mb).

I am using the EPi2ME software to run the workflow. Although I set the approx size parameter to 13000, the final assembly is also 6kb-7kb and the result is not stable every time.

May I ask, are you using the EPi2ME software or the command line version? Are there any differences?

Here I attched another fastq file which is 12900bp or so but I always got 5k-6k using EPi2ME assembly workflow. You may want to try it. Thanks! FAW84106_pass_barcode21_6cf084e1_813701d9_0.fastq.gz

sarahjeeeze commented 10 months ago

Thanks for this, will give it another go and get back to you. Sorry for the delayed response.

sarahjeeeze commented 10 months ago

If you are using this workflow via the latest EPI2ME software from here https://labs.epi2me.io/downloads/ there are no differences between that an the cmd line version. But the older EPI2ME does not have the latest version of the workflow.

sarahjeeeze commented 10 months ago

Hi, we managed to recreate this problem and we have a fix coming soon. Thanks again for pointing it out, will let you know when it's ready.

sarahjeeeze commented 9 months ago

Sorry for the delay, this is coming very soon.

FankangMeng commented 9 months ago

Sorry for the delay, this is coming very soon.

That's very cool! Thank you!

sarahjeeeze commented 9 months ago

Have now released this, in v0.5.2 please let me know if it works better with your data. Also there is an added dot plot which should help with QC of any unexpected repeats.

sarahjeeeze commented 8 months ago

Closing through lack of response

epi2me-labs / wf-clone-validation