PoonLab / sierra-local

Retrieve HIVdb algorithm as XML and apply locally to HIV sequences
GNU General Public License v3.0
6 stars 3 forks source link

Incorrect NA sequence from nucamino #69

Closed WilliamZekaiWang closed 1 year ago

WilliamZekaiWang commented 1 year ago

Sierrapy aligned NA sequence doesn't match up with sierra-local

For RT AY802669.CA9954.B.53, incorrect nucleotides

Sierrapy 
CCCATTAGTCCTATTGAAACTGTACCAGTAAAATTAAAGCCAGGAATGGATGGCCCAAAAGTTAAACAATGGCCATTGACAGAAGAGAAAATAAAAGCATTAATGGAAATTTGTGCATTTCTGGAAGAGGAAGGAAAAATTTCAAAAATTGGGCCTGAAAATCCATACAACACTCCAATATTTGCCATAAAGAAAAAAGGYGGTACTAAATGGAGAAAAWTAGTAGATTTCAGAGAACTTAATAAGAAAACKCAAGACTTCTGGGAAGTTCAATTAGGAATACCACATCCCGCAGGGTTAAAAAAGAAAAGATCAGTAACAGTATTGGATGTGGGGGATGCATATTTTTCAATTCCCTTAGATGAAGACTTCAGGAAGTATACTGCATTCACCATACCTAGTATAAACAATGAGACACCAGGGATTAGATATCAGTACAATGTGCTTCCACAGGGATGGAAAGGATCACCAGCAATATTCCAAAGYAGCATGACAAAAATCTTAGAGCCTTTTAGAAAACAAAATCCAAACATAGTTATCTATCAATACGTGGATGATTTGTATGTAGGATCTGACTTAGAAATAGGGCARCATAGAGCAAAAATAGAGGAACTGAGACAACAYCTGTGGAAGTGGGGGTTTTACACACCAGACGAMAAACATCAGAAAGAACCTCCATTCCATTGGATGGGTTATGAACTCCATCCTGATAAATGGACAGTACAGCCTATAGTGCTGCCAGAAAARGATAGCTGGACTGTCAATGACATACAGAAGTTAGTGGGAAAAYTRAATTGGGCAAGTCAGATTTAYGCAGGGATTAAAGTAAAGCAACTGTGTAAACTCCTTAGGGGAGCCAAAGCACTAACAGAYGTTATACCACTAACAAAAGAACAAGAGCTAGAA

Sierralocal
CCCATTAGTCCTATTGAAACTGTACCAGTAAAATTAAAGCCAGGAATGGATGGCCCAAAAGTTAAACAATGGCCATTGACAGAAGAGAAAATAAAAGCATTAATGGAAATTTGTGCATTTCTGGAAGAGGAAGGAAAAATTTCAAAAATTGGGCCTGAAAATCCATACAACACTCCAATATTTGCCATAAAGAAAAAAGGYGGTACTAAATGGAGAAAAWTAGTAGATTTCAGAGAACTTAATAAGAAAACKCAAGACTTCTGGGAAGTTCAATTAGGAATACCACATCCCGCAGGGTTAAAAAAGAAAAGATCAGTAACAGTATTGGATGTGGGGGATGCATATTTTTCAATTCCCTTAGATGAAGACTTCAGGAAGTATACTGCATTCACCATACCTAGTATAAACAATGAGACACCAGGGATTAGATATCAGTACAATGTGCTTCCACAGGGATGGAAAGGATCACCAGCAATATTCCAAAGYAGCATGACAAAAATCTTAGAGCCTTTTAGAAAACAAAATCCAAACATAGTTATCTATCAATACGTGGATGATTTGTATGTAGGATCTGACTTAGAAATAGGGCARCATAGAGCAAAAATAGAGGAACTGAGACAACAYCTGTGGAAGTGGGGGTTTTACACACCAGACGAMAAACATCAGAAAGAACCTCCATTCCATTGGATGGGTTATGAACTCCATCCTGATAAATGGACAGTACAGCCTATAGTGCTGCCAGAAAARGATAGCTGGACTGTCAATGACATACAGAAGTTAGTGGGAAAAYTRAATTGGGCAAGTCAGATTTAYGCAGGGATTAAAGTAAAGCAACTGTGTAAACTCCTTAGGGGAGCCAAAGCACTAACAGAYGTTATACCACTAACAAAAGAAGCAGCTAGA

and for RT AF009410.92RW026.C.19, incorrect indels

Sierrapy
CCAATTAGTCCCATTGAAACTGTACCAGTAAAATTAAAGCCAGGAATGGATGGCCCAAAGGTTAAACAATGGCCATTGACAGAAGAAAAAATAAAAGCATTAACAGAAATTTGTAGAGAAATGGAAAAGGAAGGAAAAATTTCAAAAATTGGGCCTGAAAATCCATATAACACTCCAGTATTTGCCATAAAAAAGAAGGACAGTACTAAGTGGAGAAAATTAGTAGACTTCAGGGAACTCAATAAAAGAACTCAAGACTTTTGGGAAGTTCAGTTAGGGATACCGCACCA-GCAGGTCTAAAAAAGAAGAAATCAGTAACAGTACTAGATGTGGGGGATGCATATTTCTCAGTTCCTTTAGATGAAG--GTTTAGGAATATACTGCATTCACCATACCTAGTATAAACAATGAAACACCAGGGATTAGATATCAGTATAATGTGCTTCCACAGGGATGGAAAGGATCACCATCAATATTCCAGAGTAGCATGACAAAAATTTTAGAGCCCTTTAGGGCACAAAACCCAGAAATGGTTATCTATCAATATATGGATGACTTGTATGTAGGATCTGACTTAGAAATAGGGCAACATAGAGCAAAAATAGAGGAGTTAAGAGGACATTTATTGAAGTGGGGATTTACCACACCAGACAAGAAACATCAGAAAGAACCCCCATTTCTTTGGATGGGGTATGAACTCCATCCTGACAAATGGACAGTACAGCCTATACA-CTGCCAGAGAAGGATAGCTGGACTGTCAATGATATACAGAAGTTAGTGGGAAAATTAAACTGGGCAAGTCAGATTTACCCAGGGATTAAGGTAAAGCAACTGTGTAAACTCCTTAGGGGAGCCAAAGCACTAACAGAAATAGTATCACTGACTGAA

Sierralocal
CCAATTAGTCCCATTGAAACTGTACCAGTAAAATTAAAGCCAGGAATGGATGGCCCAAAGGTTAAACAATGGCCATTGACAGAAGAAAAAATAAAAGCATTAACAGAAATTTGTAGAGAAATGGAAAAGGAAGGAAAAATTTCAAAAATTGGGCCTGAAAATCCATATAACACTCCAGTATTTGCCATAAAAAAGAAGGACAGTACTAAGTGGAGAAAATTAGTAGACTTCAGGGAACTCAATAAAAGAACTCAAGACTTTTGGGAAGTTCAGTTAGGGATACCGCAC-CAGCAGGTCTAAAAAAGAAGAAATCAGTAACAGTACTAGATGTGGGGGATGCATATTTCTCAGTTCCTTTAGATGAAG-GTTTAGGAA-TATACTGCATTCACCATACCTAGTATAAACAATGAAACACCAGGGATTAGATATCAGTATAATGTGCTTCCACAGGGATGGAAAGGATCACCATCAATATTCCAGAGTAGCATGACAAAAATTTTAGAGCCCTTTAGGGCACAAAACCCAGAAATGGTTATCTATCAATATATGGATGACTTGTATGTAGGATCTGACTTAGAAATAGGGCAACATAGAGCAAAAATAGAGGAGTTAAGAGGACATTTATTGAAGTGGGGATTTACCACACCAGACAAGAAACATCAGAAAGAACCCCCATTTCTTTGGATGGGGTATGAACTCCATCCTGACAAATGGACAGTACAGCCTATAC-ACTGCCAGAGAAGGATAGCTGGACTGTCAATGATATACAGAAGTTAGTGGGAAAATTAAACTGGGCAAGTCAGATTTACCCAGGGATTAAGGTAAAGCAACTGTGTAAACTCCTTAGGGGAGCCAAAGCACTAACAGAAATAGTATCACTGACTGAA
ArtPoon commented 1 year ago

The difference between the first set seems to be at the end of the sequences:

GAACAAGAGCTAGAA
GAAGCAGCTAGA

Unfortunately I can't see how one can go from one to another simply by dropping nucleotides!

ArtPoon commented 1 year ago

Second set of sequences are identical, is this a copy-paste error?

ArtPoon commented 1 year ago

May be related to #67 - not using most recent nucamino binaries

WilliamZekaiWang commented 1 year ago

I've updated the second sequence, it was a copy-paste error

Sierrapy
CCAATTAGTCCCATTGAAACTGTACCAGTAAAATTAAAGCCAGGAATGGATGGCCCAAAGGTTAAACAATGGCCATTGACAGAAGAAAAAATAAAAGCATTAACAGAAATTTGTAGAGAAATGGAAAAGGAAGGAAAAATTTCAAAAATTGGGCCTGAAAATCCATATAACACTCCAGTATTTGCCATAAAAAAGAAGGACAGTACTAAGTGGAGAAAATTAGTAGACTTCAGGGAACTCAATAAAAGAACTCAAGACTTTTGGGAAGTTCAGTTAGGGATACCGCACCA-GCAGGTCTAAAAAAGAAGAAATCAGTAACAGTACTAGATGTGGGGGATGCATATTTCTCAGTTCCTTTAGATGAAG--GTTTAGGAATATACTGCATTCACCATACCTAGTATAAACAATGAAACACCAGGGATTAGATATCAGTATAATGTGCTTCCACAGGGATGGAAAGGATCACCATCAATATTCCAGAGTAGCATGACAAAAATTTTAGAGCCCTTTAGGGCACAAAACCCAGAAATGGTTATCTATCAATATATGGATGACTTGTATGTAGGATCTGACTTAGAAATAGGGCAACATAGAGCAAAAATAGAGGAGTTAAGAGGACATTTATTGAAGTGGGGATTTACCACACCAGACAAGAAACATCAGAAAGAACCCCCATTTCTTTGGATGGGGTATGAACTCCATCCTGACAAATGGACAGTACAGCCTATACA-CTGCCAGAGAAGGATAGCTGGACTGTCAATGATATACAGAAGTTAGTGGGAAAATTAAACTGGGCAAGTCAGATTTACCCAGGGATTAAGGTAAAGCAACTGTGTAAACTCCTTAGGGGAGCCAAAGCACTAACAGAAATAGTATCACTGACTGAA

Sierralocal
CCAATTAGTCCCATTGAAACTGTACCAGTAAAATTAAAGCCAGGAATGGATGGCCCAAAGGTTAAACAATGGCCATTGACAGAAGAAAAAATAAAAGCATTAACAGAAATTTGTAGAGAAATGGAAAAGGAAGGAAAAATTTCAAAAATTGGGCCTGAAAATCCATATAACACTCCAGTATTTGCCATAAAAAAGAAGGACAGTACTAAGTGGAGAAAATTAGTAGACTTCAGGGAACTCAATAAAAGAACTCAAGACTTTTGGGAAGTTCAGTTAGGGATACCGCAC-CAGCAGGTCTAAAAAAGAAGAAATCAGTAACAGTACTAGATGTGGGGGATGCATATTTCTCAGTTCCTTTAGATGAAG-GTTTAGGAA-TATACTGCATTCACCATACCTAGTATAAACAATGAAACACCAGGGATTAGATATCAGTATAATGTGCTTCCACAGGGATGGAAAGGATCACCATCAATATTCCAGAGTAGCATGACAAAAATTTTAGAGCCCTTTAGGGCACAAAACCCAGAAATGGTTATCTATCAATATATGGATGACTTGTATGTAGGATCTGACTTAGAAATAGGGCAACATAGAGCAAAAATAGAGGAGTTAAGAGGACATTTATTGAAGTGGGGATTTACCACACCAGACAAGAAACATCAGAAAGAACCCCCATTTCTTTGGATGGGGTATGAACTCCATCCTGACAAATGGACAGTACAGCCTATAC-ACTGCCAGAGAAGGATAGCTGGACTGTCAATGATATACAGAAGTTAGTGGGAAAATTAAACTGGGCAAGTCAGATTTACCCAGGGATTAAGGTAAAGCAACTGTGTAAACTCCTTAGGGGAGCCAAAGCACTAACAGAAATAGTATCACTGACTGAA

in case it fails again


Sierrapy
GCACCA-GCAGGTCTAAAAAAGAAGAAATCAGTAACAGTACTAGATGTGGGGGATGCATATTTCTCAGTTCCTTTAGATGAAG--GTTTAGGAATAT
Sierra-local
GCAC-CAGCAGGTCTAAAAAAGAAGAAATCAGTAACAGTACTAGATGTGGGGGATGCATATTTCTCAGTTCCTTTAGATGAAG-GTTTAGGAA-TAT