Closed rimjhimroy closed 4 years ago
Hi @rimjhimroy,
It looks like there were joins made in the assembly but very few? I think this is not the case but double check that you don't have an empty graph file (*original.gv).
I'd suggest sweeping on a few of the parameters. For example:
You could also plot the distribution of the barcode multiplicity (That multiplicity file is an output of the arks-make
pipeline) to double check that your specified multiplicity range includes the bulk of your data.
Hopefully one of those suggestions improves your resulting contiguity! If needed, you could also look at lowering c
and l
.
I should note that ARKS will work best with a more contiguous assembly, and your assembly isn't particularly contiguous right now. Have you considered assembling your chromium data with Supernova? We've seen good results with running Supernova, and then scaffolding with ARKS.
Hope that helps! Lauren
Hi Lauren
Thank you for your quick reply. I don't have an empty *original.gv file. Looking into the barcode multiplicity file I find that most of my barcodes are in the range 5-1000,
I also have 350433 out of 1623642 barcodes with multiplicity =2. My read lengths on an average are 250bp. Does this sound concerning to you?
I am running Supernova on my data, but I am still waiting for it to finish after 20 days.
Thanks again, Rimjhim
Hi Rimjhim,
Good to know that the gv file isn't empty and your barcode multiplicity seems in the right ballpark.
It is expected that a large number of the reads will have barcodes with a low multiplicity -- that's just because the barcode is in read 1 (and then clipped out by longranger basic
), so it it possible to get base errors there. Those barcodes are likely just due to these errors.
Your chromium reads are 250bp? Huh I have only come across chromium reads that are 2x150bp (128bp/150bp after longranger basic), and I was under the impression that that was standard for the 10x Genomics tools. Based on the 10X genomics website, Supernova expects 2x150bp reads: https://support.10xgenomics.com/de-novo-assembly/sequencing/doc/specifications-sequencing-requirements-for-de-novo-assembly Was this a bespoke library construction process? Certainly 20 days is a very long time for Supernova -- I'd expect a human-sized assembly to finish within a week at most.
Lauren
Closing this issue due to inactivity -- feel free to re-open if you still have questions.
Hi,
I have produced a draft genome assembly of ~1Gbp plant genome with MaSuRCA based on Illumina paired-end reads and I additionally have ~50X 10x Genomics data which I wanted to use to scaffold the draft genome.
I first used longranger basic (longrangerv2.2.2) to produce the interleaved barcoded fastq.gz files the stat for which are:
The barcoded fastq.gz file given by longranger basic is in the following format:
I first used the default tigmint parameters and then the following parameters to run arks:
arks-make arks draft=mref.tigmint m=25-20000 z=1000 k=60 reads=lfq j=0.5 t=30 a=0.5
Here are some stats for- a) my original assembly, b) tigmint split assembly, and, c) final assembly using arks-links pipeline:
I am not sure why I am not getting an improvement in scaffolding using my data.
I was wondering if you could please help me by letting me know if I am doing something wrong and how I can improve it?
Arks log file: run_arkslinks.txt
Thanks a lot,
Best, Rimjhim