baoxingsong / AnchorWave

Sensitive alignment of genomes with high sequence diversity, extensive structural polymorphism and whole-genome duplication variation
MIT License
145 stars 19 forks source link

there is no match anchor found in the input sam file #28

Open peterdfields opened 2 years ago

peterdfields commented 2 years ago

Hi,

I'm trying to run AnchorWave on a pair of non-plant genomes. Whether I use minimap2 or gmap when I run the anchorwave genoAli function I see the following:

AVX2 is enabled
reading reference sam begin
there is no match anchor found in the input sam file

There are definitely alignments in the respective sam files. Is there anything I can do to diagnose what might be going wrong? Thank you for your time and assistance.

peterdfields commented 2 years ago

In case it's useful to diagnose the issue here, when I run anchorwave proali the above error doesn't seem to occur.

baoxingsong commented 2 years ago

Thanks for trying AnchorWave. Are the chromosomes from the reference genome and query genome named in the same way, please? Chromosomes with the same name would be aligned using the genoAli command.

peterdfields commented 2 years ago

@baoxingsong Thank you for creating AnchorWave! In the present case, the chromosomes are not named the same as we don't presently have a complete idea about which chromosomal scaffolds are homologous. We're hoping to use AnchorWave in part to determine this relationship. Should homology be determined with proali before using genoAli?

baoxingsong commented 2 years ago

To use genoAli, you need to have chromosome level assembly, and the chromosomes from the reference genome and query genome named in the same way. Moreover, genoAli assumes there is no translocation.

proali is more flexible and might be OK for your genome alignment.

If you are comparing two assemblies of the same species, proali should be helpful to investigate homolog chromosomes.

goeckeritz commented 1 year ago

Hi baoxingsong,

Cool tool, thanks for making this!! I have this same issue peterdfields is describing, and I've named the chromsomes in the same way. Here are the first lines of my ref.sam and query.sam:

ref.sam: @SQ SN:chr4 LN:25843236 @PG ID:minimap2 PN:minimap2 VN:2.20-r1061 CL:minimap2 -x splice -t 10 -k 12 -a -p 0.4 -N 20 ../P_persica_chr4.fasta ../cds/subA_cds.fasta Pcer_022906-RA 0 chr4 805109 60 456M * 0 0 ATGAAATCCTTCATGCTTTTCCTAATGCTTGCCATGCTAATGGCTTCAGCCATCACTACTCTTTCTGCGATACCAGACGAAGAAGAATCATTCCTCAACGAGGAAAACAATAATGATGCAAATGACGAAACCAAAAGCCAGTTGGAAAAAAGTACTTCTCTGAGGGGAAGAAGCCGCTTCCTTGCCTCCCGGCCGCCCACGATGACTTGCGACAGATACCCTAAGGTTTGTGGGGCGTCGGGCAGCGCAGGGCCAGATTGCTGCAAGAAGAAATGTGTGGACACGAACACAGACAGAGCAAACTGTGGCAAGTGTGGGAGGAAATGCAAGTACGCAGAGATATGCTGCAAAGGTAAGTGTGTGAATCCGAGGTCGGACAAGAAAAACTGCGGCAGCTGCAACAACAAATGCAAGAAAGGCAGCTCATGTGCGTATGGGATGTGCAGCTATGCATGA * NM:i:11 ms:i:423 AS:i:423 nn:i:0 tp:A:P cm:i:112 s1:i:412 s2:i:161 de:f:0.0241 rl:i:0 Pcer_022907-RA 0 chr4 805133 1 116M2501N1I40M5D3M2D2M5D252M * 0 0 ATGTTTGCCATGCTAATAGCTTCAGCCATCACCACTCTCTCTGCAATACCAAACGAAGAAGAATCATTCTTCAACGAGGAAAACAATACTGATACAAATGACGAAACCAAAAACCAGTTTGGAAAAAGCACTTCTCTAAGAAGCCGCTTCCTTGCCTCCGTGACTTGTGACAAAAACCCTAAGGTTTGTCAGGCGTATGGCAGCGCAAAGCCGGATTGCTGCAACAAGAAATGTGTGGACAGAAACACAGACACAGCAAACTGCGGCAAGTGTGGGAAGAAATGCAATTACGCAGAGATTTGCTGCGAAGGTAAGTGTGTGAATCCGATGTCGGACAAGGAAAACTGCGGCAGCTGCAACAACAAGTGCAAGAAAGGCACTTCATGTGTGTTTGGAATGTGCAGCTATGCATGA * NM:i:36 ms:i:323 AS:i:287 nn:i:0 ts:A:+ tp:A:P cm:i:64 s1:i:265 s2:i:282 de:f:0.0647 rl:i:0

query.sam: @SQ SN:chr4 LN:31254986 @PG ID:minimap2 PN:minimap2 VN:2.20-r1061 CL:minimap2 -x splice -t 10 -k 12 -a -p 0.4 -N 20 ../subA_chr4.fasta ../cds/subA_cds.fasta Pcer_022906-RA 0 chr4 6188 44 456M * 0 0 ATGAAATCCTTCATGCTTTTCCTAATGCTTGCCATGCTAATGGCTTCAGCCATCACTACTCTTTCTGCGATACCAGACGAAGAAGAATCATTCCTCAACGAGGAAAACAATAATGATGCAAATGACGAAACCAAAAGCCAGTTGGAAAAAAGTACTTCTCTGAGGGGAAGAAGCCGCTTCCTTGCCTCCCGGCCGCCCACGATGACTTGCGACAGATACCCTAAGGTTTGTGGGGCGTCGGGCAGCGCAGGGCCAGATTGCTGCAAGAAGAAATGTGTGGACACGAACACAGACAGAGCAAACTGTGGCAAGTGTGGGAGGAAATGCAAGTACGCAGAGATATGCTGCAAAGGTAAGTGTGTGAATCCGAGGTCGGACAAGAAAAACTGCGGCAGCTGCAACAACAAATGCAAGAAAGGCAGCTCATGTGCGTATGGGATGTGCAGCTATGCATGA * NM:i:0 ms:i:456 AS:i:456 nn:i:0 tp:A:P cm:i:148 s1:i:452 s2:i:427 de:f:0 rl:i:12 Pcer_022906-RA 256 chr4 956553 0 456M * 0 0 * * NM:i:7 ms:i:435 AS:i:435 nn:i:0 tp:A:S cm:i:124 s1:i:427 de:f:0.0154 rl:i:12 Pcer_022907-RA 0 chr4 9043 60 414M * 0 0 ATGTTTGCCATGCTAATAGCTTCAGCCATCACCACTCTCTCTGCAATACCAAACGAAGAAGAATCATTCTTCAACGAGGAAAACAATACTGATACAAATGACGAAACCAAAAACCAGTTTGGAAAAAGCACTTCTCTAAGAAGCCGCTTCCTTGCCTCCGTGACTTGTGACAAAAACCCTAAGGTTTGTCAGGCGTATGGCAGCGCAAAGCCGGATTGCTGCAACAAGAAATGTGTGGACAGAAACACAGACACAGCAAACTGCGGCAAGTGTGGGAAGAAATGCAATTACGCAGAGATTTGCTGCGAAGGTAAGTGTGTGAATCCGATGTCGGACAAGGAAAACTGCGGCAGCTGCAACAACAAGTGCAAGAAAGGCACTTCATGTGTGTTTGGAATGTGCAGCTATGCATGA * NM:i:0 ms:i:414 AS:i:414 nn:i:0 tp:A:P cm:i:150 s1:i:412 s2:i:353 de:f:0 rl:i:0

The genoAli command I am using: anchorwave genoAli -i ../P_persica_chr4.gff3 -as ../cds/subA_cds.fasta -r ../P_persica_chr4.fasta -a subA_query.sam -ar peach_ref_subA.sam -s ../subA_chr4.fasta -v subA_to_peach.vcf -n subA.anchors -o subA.maf -f subA.f.maf > subA.log

I should also probably mention I get the same error when I use proali too. Any ideas as to what is going wrong?? Thanks in advance for any help you can offer.

Kindly, Charity

xzhoubayer commented 1 year ago

I got the same error. both reference and query genomes have the same chromosome IDs. for example, >chr1 for reference and >chr1 for query, too.

gnxsf commented 1 year ago

I also receive the same error when running genoAli, despite having matching chromosomes between reference and query.

gnxsf commented 1 year ago

For those referencing this later, the issue was that the chromosome names in the gff file must also be renamed to match chromosome names in the reference and query fasta's. genoAli runs successfully once chromosome names in all three files match.

baoxingsong commented 3 months ago

Hello, there are errors when I use proali or genoAli. there is no match anchor found in the input sam file. The naming all mode is chr01-chr12

please share all your input files and commands, so that we can help you.