Open polchan opened 6 months ago
I have exactly the same question.
@lh3
@zhaotao1987 @polchan Sorry for the late reply since I was quite busy this month. The short answer is no. UL reads are mainly used to resolve repeats and fill gaps, instead of scaffolding. So it would be better to have all data from the same sample.
Dear Haoyu, @chhylp123 @polchan Thanks very much for your response, I see. And I think 'scaffolding' maybe confusing here, I would like to confirm with you and put my question as such: By saying 'fill gaps', do you mean that UL can help resolve more repetitive regions of the string graph (from the hifi reads), so that such regions can be correctly joined and as a result, the number of gaps reduced (thus, 'fill gaps'). Or, you mean, at some stage, UL reads will be used to fill the gaps, using an approach as generally deemed (finding overlaps with contig ends and fill gaps). The differences are, for the first scenario, the exact sequences of UL will not be used, just for alignment and finding the correct path, the second scenario means the detail sequences of the UL reads will be integrated into the assembly as well. I guess you probably mean the first scenario? As in the gfa file which lead to the final assembly, it seems to me only the CCS reads are included, not including any UL reads, also I think UL reads are error-prone, it seems hifiasm does not perform reads-correcting for UL reads before integrating. So if it is scenario one, I think using close related species ONT or even hifi reads assembly maybe doable, just to solve the repetitive regions of the string graph, made from its own hifi reads.
I hope I made myself clear, thanks very much!
Best, Tao
Hifiasm actually takes advantage of UL read in both scenarios. But in practice, scenario 2 is rare.
Thanks very much for your response. We have a new question! We have identified sequences not only from CCS but also from ONT (SRR24941509) in the gfa file. However, we are unsure where the sequences labeled as 'scaf' originated from." And why are all these 'scaf' sequences 116 base pairs in length?
A h2tg000002l 31694941 - scaf 0 116 id:i:2720614 HG:A:m A h2tg000003l 19927517 + SRR24941509.98734 0 580 id:i:2720615 HG:A:m A h2tg000009l 21858274 + scaf 0 116 id:i:2720629 HG:A:m A h2tg000016l 20021719 + scaf 0 116 id:i:2720618 HG:A:a A h2tg000017l 12012600 + scaf 0 116 id:i:2720638 HG:A:m A h2tg000262c 48300 - SRR24941509.81037 0 14119 id:i:2720650 HG:A:m
Hi, I have the same question as polchan above.
This scaffolding was generated using UL read alignments rather than the dual-scf option. Such cases often occur when UL alignments are inconsistent with HiFi read overlaps. As a result, hifiasm cannot accurately determine the correct length of the small scaffolding gap and defaults to the predefined length of 116 for the scaffold.
I see. So is this scaffolded region also filled with "Ns"? In which case, how is it different from the other "Ns" scaffolds?
I remember it should be filled with "Ns", but you could double check by looking at the contigs. This might be more reliable than other "Ns" scaffolds.
I see. Thanks so much!
Hello, Cheng!
I would like to ask about the strategy for using Ultra-long ONT integration. Is the purpose of the ONT sequence to assist with scaffolding, or will it be merged into the final sequence? I currently have published ONT data for the same species. Can I use this data to assist in assembling another sample of the same species? The two samples only differ in their cultivation varieties.
Thank you!