Closed francicco closed 1 year ago
Hi @francicco,
The arks-long
(and tigmint-long
) steps in the LongStitch pipeline work by preprocessing the long reads to generate 'pseudo-linked reads'. You'll see this step called long-to-linked-pe
- it allows us to utilize our Tigmint and ARCS tools which were originally developed to use linked reads to also use long reads. The pseudo-linked reads are then mapped to the draft assembly and the ARKS step proceeds as normal. Hence, you'll see those log messages.
Let me know if you have any other questions - thank you for your interest in LongStitch! Lauren
Hi @lcoombe,
Thank you very much. That makes sense.
A follow-up question. Executing the command I've sent you the final assembly seems to have a better contiguity but the BUSCO score drops significantly. It goes from 96.8% complete single/2.6% missing to 82.5% and 17%, respectively. There's clearly something that is not working as it should. Any idea why?
Thanks a lot Francesco
Hi @francicco,
It would be helpful to look at the statistics at each stage of LongStitch (ex. post-Tigmint, post-ntLink, post-ARKS). That could be informative to understand where the BUSCO drop is happening. Sometimes Tigmint can over-cut the assembly, for example.
Yes, that was my plan, my feeling was exactly that.
What would be the command to just do the scaffolding? Also, is there any gap filling procedure implemented in longstich already?
Thanks a lot F
Good morning :)
I had a couple of queries linked to this thread so thought I would post them here rather than adding a new query. Disclaimer, I am also new to the old genome assembly so I apologise for any stupid/noob queries.
@lcoombe Firstly this tool looks fantastic. I have a nice clean hifi assembled de novo genome for my non model organism (N50 ~30 million, L50 8, length ~500 million which is what is expected). I am at the scaffolding stage but resources for my organism are limited (im currently seeing if i can swing some hic sequecing). Could i theoretically use your tool to take my assembly, and then input the original hifi reads to try and do the scaffolding? I have been googling this and am not sure if this is a big no no, or an acceptable approach.
@francicco how was your input draft assembly assembled? Is it the one assembled from hifi reads or an older version (kind of linking this back to query above). I do have a really, really, really bad old reference genome for the speceis i work in, so I may try and see if i can use my hifi reads to improve this one.
Thank you for the help in advance, I appreciate any and all comments :).
Ben
Hi @benyoung93,
Yes, HiFi data assembled with HIFIasm F
Hi @francicco
What would be the command to just do the scaffolding?
To only run the scaffolding (without Tigmint misassembly correction), you can just replace your target (currently tigmint-ntLink-arks
) with ntLink-arks
Also, is there any gap filling procedure implemented in longstich already?
The ntLink scaffolding step can perform gap-filling - you can turn this on with gap_fill=True
. Just make sure that you are using LongStitch v1.0.3+ and ntLink v1.2.0+.
A couple of notes about the gap-filling feature - this will only attempt to fill gaps that ntLink itself creates through scaffolding. Also, it will currently fill the gaps with raw read sequence, so you may consider downstream polishing, although it's less of a concern with your accurate HiFi reads.
Hi @benyoung93,
Thank you for your kind words!
Could i theoretically use your tool to take my assembly, and then input the original hifi reads to try and do the scaffolding?
Yes, this is indeed one of the most common uses of LongStitch - to improve upon a long read assembly with the same long reads! We have an example in our LongStitch paper, where we assemble human long reads using Shasta and improve upon that Shasta assembly with the same long reads.
Thank you for your interest in LongStitch - and feel free to open new issues if you have any questions! I'm always happy to help, and enjoy hearing from our users. Lauren
I am currently reading the paper lol. Super interesting stuff.
I was actually just trying to use ntlink, and then I realised that it is part of the longstitch pipeline so I am going to have a look at the difference in outputs between ntlink, longstitch + nt, longstitch + nt + ARKs.
I was having some issues with the old conda installation, but I see it is a python version from another issue so hopefully that will fix everything.
Thank you so much for the quick reply i really appreciate it :).
Ben
@benyoung93 - Awesome!! Yes, don't hesitate if you have any questions about the different options for using our tools or if you have lingering installation issues!
Hi @lcoombe,
I tried without tigmint, the results are very good actually:
Parsing DraftGenome.k32.w100.ntLink-arks.longstitch-scaffolds.fa
N50: 9,174,343
N90: 2,216,306
Number of contigs : 68
Longest contig : 16713kb
Genome Length (GL) : 130321118
GL without N/X : 128812966
I wander if I should skip the tigmint part.
What would you do? Cheers F
Ok great!
It really depends on the situation. If you are finding that running Tigmint is in the end detrimental to the assembly (ie. it's cutting too much), then I think it's just fine to keep it out of the equation. The downside is that you will not cut at any putative misassemblies, but in this case, that may be OK given that you have a high % complete BUSCO in your baseline assembly.
You could also specify more stringent parameters (ex. span=2
) for the Tigmint step which should reduce the number of cuts, but I know that playing around with these things takes time, which you might want to avoid.
Thanks a lot! I'll give few tries! Cheers F
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your interest in LongStitch!
Hi,
I'm testing Longstitch with PacBio data. This is how I execute the analysis:
longstitch tigmint-ntLink-arks draft=DraftGenome reads=m64147e_230220_093703.hifi_reads G=200000000 w=150 k_ntLink=24 longmap=hifi
At some point ARCS says:
=>Reading Chromium FASTQ file(s)...
is this normal or I have to specify other options?it seems to find barcodes:
which is a bit odd...
Any help? Thanks a lot F