Open sdws1983 opened 1 year ago
Hi @sdws1983, are you showing the graphs that seqwish emits, right? or the output of smoothxg? Would you please explain more about why anchorwave's alignments are better? The anchorwave graph looks strongly underaligned.
Hi @sdws1983, are you showing the graphs that seqwish emits, right? or the output of smoothxg? Would you please explain more about why anchorwave's alignments are better? The anchorwave graph looks strongly underaligned.
I used the output of smoothxg.
For the alignment quality, I calculated the total Number of residue matches and Alignment block length for both wfmash and anchorwave (The sum of the ninth and tenth columns in the paf file): for anchorwave, the total matches is 42654922 and alignment block length is 202495300, which slightly higher than the wfmash (41712802 and 88998748).
In fact, the anchorwave graph looks strongly underaligned is so strange, because I noticed some perfect alignments (only a few mismatches) in paf were underaligned in the final gfa graph.
Could you show the output of seqwish, instead? For both wfmash's PAF and anchorwave's PAF.
Sure! Here are the outputs:
wfmash:
anchorwave:
Thank you! I was not expecting so much white on the top-right! Any chance to share the FASTA input?
What is the length of each graph (from seqwish or smoothxg)? odgi stats -S should provide this.
On Mon, Feb 20, 2023, 20:01 Andrea Guarracino @.***> wrote:
Thank you! I was not expecting so much white on the top-right! Any chance to share the FASTA input?
— Reply to this email directly, view it on GitHub https://github.com/ekg/seqwish/issues/109#issuecomment-1437757748, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABDQEP3Z6ALNK6DGBCN3GTWYQOY5ANCNFSM6AAAAAAVBZPZCU . You are receiving this because you are subscribed to this thread.Message ID: @.***>
By the way, when I merged the adjacent alignment blocks in anchorwave results, it seems OK now:
What is the length of each graph (from seqwish or smoothxg)? odgi stats -S should provide this. … On Mon, Feb 20, 2023, 20:01 Andrea Guarracino @.> wrote: Thank you! I was not expecting so much white on the top-right! Any chance to share the FASTA input? — Reply to this email directly, view it on GitHub <#109 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABDQEP3Z6ALNK6DGBCN3GTWYQOY5ANCNFSM6AAAAAAVBZPZCU . You are receiving this because you are subscribed to this thread.Message ID: @.>
for anchorwave:
#length nodes edges paths steps
262082885 188761 257675 2 257231
for wfmash:
#length nodes edges paths steps
227029223 2326424 3171930 2 3234227
@AndreaGuarracino @ekg So I wonder why the abnormal graph disappears when I merge the alignment intervals to make them larger. Is there any way that seqwish can handle PAF files with smaller alignment intervals?
Note that the anchorwave graph is 40mbp smaller than the wfmash one.
That suggests a significant amount of difference in the alignments. It's not clear which is true but in general a smaller graph indicates a better input alignment.
On Wed, Feb 22, 2023, 20:22 Yumin Huang @.***> wrote:
@AndreaGuarracino https://github.com/AndreaGuarracino @ekg https://github.com/ekg So I wonder why the abnormal graph disappears when I merge the alignment intervals to make them larger. Is there any way that seqwish can handle PAF files with smaller alignment intervals?
— Reply to this email directly, view it on GitHub https://github.com/ekg/seqwish/issues/109#issuecomment-1441139141, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABDQEPY2KFLQX5BIXLCB43WY3CW3ANCNFSM6AAAAAAVBZPZCU . You are receiving this because you were mentioned.Message ID: @.***>
Note that the anchorwave graph is 40mbp smaller than the wfmash one. That suggests a significant amount of difference in the alignments. It's not clear which is true but in general a smaller graph indicates a better input alignment. … On Wed, Feb 22, 2023, 20:22 Yumin Huang @.> wrote: @AndreaGuarracino https://github.com/AndreaGuarracino @ekg https://github.com/ekg So I wonder why the abnormal graph disappears when I merge the alignment intervals to make them larger. Is there any way that seqwish can handle PAF files with smaller alignment intervals? — Reply to this email directly, view it on GitHub <#109 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABDQEPY2KFLQX5BIXLCB43WY3CW3ANCNFSM6AAAAAAVBZPZCU . You are receiving this because you were mentioned.Message ID: @.>
Yes, I agree with your point. I used anchorwave even though it may lose some alignments, it simplifies the graph and makes downstream analysis more convenient.
What I want to say here is that I fetched the PAF file from anchorwave, and although the dot plot shows extensive aligned regions (as shown below), these alignments disappeared after running with seqwish.
@sdws1983 Did you keep the CIGAR in the PAF from AnchorWave
? What I tried before in the tetraploid potato, the AnchorWave-based graph is similar to yours. But AnchorWave
intend to create a large block with a large gap (benefits of gene anchor for repeat-rich genome), but it was not natively supported PAF output (basic format covert). Have you tried you use another CIGAR-based SV caller for AnchorWave alignment (eg. SyRI) to compare the variant callset?
@sdws1983 Did you keep the CIGAR in the PAF from
AnchorWave? What I tried before in the tetraploid potato, the AnchorWave-based graph is similar to yours. But
AnchorWave` intend to create a large block with a large gap (benefits of gene anchor for repeat-rich genome), but it was not natively supported PAF output (basic format covert). Have you tried you use another CIGAR-based SV caller for AnchorWave alignment (eg. SyRI) to compare the variant callset?
@baozg I keep the CIGAR from the sam file generated from maf-convert
. So you think we should use another aligner for CIGAR generation based on the AnchorWave blocks? I do not fully understand your advice.
For the CIGAR inspection, you could try caller. But I am not sure of the difference between these, I think seqwish
also builds graphs relying on the CIGAR. The default output of AnchorWave
MAF output, I guess you follow the MAF->SAM-> PAF
workflow. Ideally, it should be the same, but I am not sure if all these convert keep all the information or AnchorWave
use the same pattern of CIGAR with other aligners (large gap blocks introduced by the gene synteny). I will try AnchorWave
again for building graphs and update the information if I have found something new.
For the CIGAR inspection, you could try caller. But I am not sure of the difference between these, I think
seqwish
also builds graphs relying on the CIGAR. The default output ofAnchorWave
MAF output, I guess you follow theMAF->SAM-> PAF
workflow. Ideally, it should be the same, but I am not sure if all these convert keep all the information orAnchorWave
use the same pattern of CIGAR with other aligners (large gap blocks introduced by the gene synteny). I will tryAnchorWave
again for building graphs and update the information if I have found something new.
Thanks, Zhigui. I have also been thinking about these issues recently. I thought it might be related to the anchorwave
's alignment mechanism which induced many gaps in its CIGAR characters. Would this cause seqwish
to consider the alignment quality to be low and delete these alignments? To address this, I tried lowering the -k
parameter in seqwish
to retain all the fine alignments, but the resulting graph still had almost no alignments.
On the other hand, when I used the maf
file generated by anchorwave
for the whole blocks and converted it to paf
, I did not encounter the problem of large-scale mismatches when constructing the graph. This is very strange because the large block is composed of all small alignment results strung together. I have not yet found a reason to explain these issues, but I will continue to try to solve them. I hope we can continue to communicate.
If you want to use AnchorWave
for the population-scale assembly variant calling with plant genomes, their solution was converting it into GVCF and merging (https://www.biorxiv.org/content/10.1101/2023.03.02.530873v1).
Hello!
I am using seqwish for pan-genome construction, and I have a question about the alignment result.
Typically, according to the pggb pipeline, the wfmash tool is used for mapping and the resulting PAF file is input into seqwish for construction. However, I have used anchorwave instead of wfmash for mapping, and after some format conversion, I have obtained a PAF-format alignment result. I have input this PAF file into seqwish and followed the pggb pipeline for subsequent steps (including smoothxg and odgi). However, the resulting 2D graph appears to be mostly composed of abnormal alignments, even though the anchorwave alignment results are better than those obtained using wfmash (including longer total alignment length). I suspect that the cause of the abnormal 2D graph is the shorter average alignment length using anchorwave (around 25k versus over 400k with wfmash). However, I am not sure if my understanding is correct and how to modify the seqwish parameters to avoid abnormal 2D graphs. Can anyone provide me with some suggestions? Thank you!
anchorwave graph:
wfmash graph:
Here are some of the commands I used:
Thanks!
Yumin