Open peterdfields opened 2 years ago
In case it's useful to diagnose the issue here, when I run anchorwave proali
the above error doesn't seem to occur.
Thanks for trying AnchorWave. Are the chromosomes from the reference genome and query genome named in the same way, please? Chromosomes with the same name would be aligned using the genoAli command.
@baoxingsong Thank you for creating AnchorWave! In the present case, the chromosomes are not named the same as we don't presently have a complete idea about which chromosomal scaffolds are homologous. We're hoping to use AnchorWave in part to determine this relationship. Should homology be determined with proali
before using genoAli
?
To use genoAli
, you need to have chromosome level assembly, and the chromosomes from the reference genome and query genome named in the same way. Moreover, genoAli
assumes there is no translocation.
proali
is more flexible and might be OK for your genome alignment.
If you are comparing two assemblies of the same species, proali
should be helpful to investigate homolog chromosomes.
Hi baoxingsong,
Cool tool, thanks for making this!! I have this same issue peterdfields is describing, and I've named the chromsomes in the same way. Here are the first lines of my ref.sam and query.sam:
ref.sam:
@SQ SN:chr4 LN:25843236 @PG ID:minimap2 PN:minimap2 VN:2.20-r1061 CL:minimap2 -x splice -t 10 -k 12 -a -p 0.4 -N 20 ../P_persica_chr4.fasta ../cds/subA_cds.fasta Pcer_022906-RA 0 chr4 805109 60 456M * 0 0 ATGAAATCCTTCATGCTTTTCCTAATGCTTGCCATGCTAATGGCTTCAGCCATCACTACTCTTTCTGCGATACCAGACGAAGAAGAATCATTCCTCAACGAGGAAAACAATAATGATGCAAATGACGAAACCAAAAGCCAGTTGGAAAAAAGTACTTCTCTGAGGGGAAGAAGCCGCTTCCTTGCCTCCCGGCCGCCCACGATGACTTGCGACAGATACCCTAAGGTTTGTGGGGCGTCGGGCAGCGCAGGGCCAGATTGCTGCAAGAAGAAATGTGTGGACACGAACACAGACAGAGCAAACTGTGGCAAGTGTGGGAGGAAATGCAAGTACGCAGAGATATGCTGCAAAGGTAAGTGTGTGAATCCGAGGTCGGACAAGAAAAACTGCGGCAGCTGCAACAACAAATGCAAGAAAGGCAGCTCATGTGCGTATGGGATGTGCAGCTATGCATGA * NM:i:11 ms:i:423 AS:i:423 nn:i:0 tp:A:P cm:i:112 s1:i:412 s2:i:161 de:f:0.0241 rl:i:0 Pcer_022907-RA 0 chr4 805133 1 116M2501N1I40M5D3M2D2M5D252M * 0 0 ATGTTTGCCATGCTAATAGCTTCAGCCATCACCACTCTCTCTGCAATACCAAACGAAGAAGAATCATTCTTCAACGAGGAAAACAATACTGATACAAATGACGAAACCAAAAACCAGTTTGGAAAAAGCACTTCTCTAAGAAGCCGCTTCCTTGCCTCCGTGACTTGTGACAAAAACCCTAAGGTTTGTCAGGCGTATGGCAGCGCAAAGCCGGATTGCTGCAACAAGAAATGTGTGGACAGAAACACAGACACAGCAAACTGCGGCAAGTGTGGGAAGAAATGCAATTACGCAGAGATTTGCTGCGAAGGTAAGTGTGTGAATCCGATGTCGGACAAGGAAAACTGCGGCAGCTGCAACAACAAGTGCAAGAAAGGCACTTCATGTGTGTTTGGAATGTGCAGCTATGCATGA * NM:i:36 ms:i:323 AS:i:287 nn:i:0 ts:A:+ tp:A:P cm:i:64 s1:i:265 s2:i:282 de:f:0.0647 rl:i:0
query.sam:
@SQ SN:chr4 LN:31254986 @PG ID:minimap2 PN:minimap2 VN:2.20-r1061 CL:minimap2 -x splice -t 10 -k 12 -a -p 0.4 -N 20 ../subA_chr4.fasta ../cds/subA_cds.fasta Pcer_022906-RA 0 chr4 6188 44 456M * 0 0 ATGAAATCCTTCATGCTTTTCCTAATGCTTGCCATGCTAATGGCTTCAGCCATCACTACTCTTTCTGCGATACCAGACGAAGAAGAATCATTCCTCAACGAGGAAAACAATAATGATGCAAATGACGAAACCAAAAGCCAGTTGGAAAAAAGTACTTCTCTGAGGGGAAGAAGCCGCTTCCTTGCCTCCCGGCCGCCCACGATGACTTGCGACAGATACCCTAAGGTTTGTGGGGCGTCGGGCAGCGCAGGGCCAGATTGCTGCAAGAAGAAATGTGTGGACACGAACACAGACAGAGCAAACTGTGGCAAGTGTGGGAGGAAATGCAAGTACGCAGAGATATGCTGCAAAGGTAAGTGTGTGAATCCGAGGTCGGACAAGAAAAACTGCGGCAGCTGCAACAACAAATGCAAGAAAGGCAGCTCATGTGCGTATGGGATGTGCAGCTATGCATGA * NM:i:0 ms:i:456 AS:i:456 nn:i:0 tp:A:P cm:i:148 s1:i:452 s2:i:427 de:f:0 rl:i:12 Pcer_022906-RA 256 chr4 956553 0 456M * 0 0 * * NM:i:7 ms:i:435 AS:i:435 nn:i:0 tp:A:S cm:i:124 s1:i:427 de:f:0.0154 rl:i:12 Pcer_022907-RA 0 chr4 9043 60 414M * 0 0 ATGTTTGCCATGCTAATAGCTTCAGCCATCACCACTCTCTCTGCAATACCAAACGAAGAAGAATCATTCTTCAACGAGGAAAACAATACTGATACAAATGACGAAACCAAAAACCAGTTTGGAAAAAGCACTTCTCTAAGAAGCCGCTTCCTTGCCTCCGTGACTTGTGACAAAAACCCTAAGGTTTGTCAGGCGTATGGCAGCGCAAAGCCGGATTGCTGCAACAAGAAATGTGTGGACAGAAACACAGACACAGCAAACTGCGGCAAGTGTGGGAAGAAATGCAATTACGCAGAGATTTGCTGCGAAGGTAAGTGTGTGAATCCGATGTCGGACAAGGAAAACTGCGGCAGCTGCAACAACAAGTGCAAGAAAGGCACTTCATGTGTGTTTGGAATGTGCAGCTATGCATGA * NM:i:0 ms:i:414 AS:i:414 nn:i:0 tp:A:P cm:i:150 s1:i:412 s2:i:353 de:f:0 rl:i:0
The genoAli command I am using:
anchorwave genoAli -i ../P_persica_chr4.gff3 -as ../cds/subA_cds.fasta -r ../P_persica_chr4.fasta -a subA_query.sam -ar peach_ref_subA.sam -s ../subA_chr4.fasta -v subA_to_peach.vcf -n subA.anchors -o subA.maf -f subA.f.maf > subA.log
I should also probably mention I get the same error when I use proali too. Any ideas as to what is going wrong?? Thanks in advance for any help you can offer.
Kindly, Charity
I got the same error. both reference and query genomes have the same chromosome IDs. for example, >chr1 for reference and >chr1 for query, too.
I also receive the same error when running genoAli, despite having matching chromosomes between reference and query.
For those referencing this later, the issue was that the chromosome names in the gff file must also be renamed to match chromosome names in the reference and query fasta's. genoAli runs successfully once chromosome names in all three files match.
Hello, there are errors when I use proali or genoAli. there is no match anchor found in the input sam file. The naming all mode is chr01-chr12
please share all your input files and commands, so that we can help you.
Hi,
I'm trying to run AnchorWave on a pair of non-plant genomes. Whether I use minimap2 or gmap when I run the
anchorwave genoAli
function I see the following:There are definitely alignments in the respective sam files. Is there anything I can do to diagnose what might be going wrong? Thank you for your time and assistance.