PoonLab / sierra-local

Retrieve HIVdb algorithm as XML and apply locally to HIV sequences
GNU General Public License v3.0
6 stars 4 forks source link

Plan return object (data structure) from scoring script #6

Closed ArtPoon closed 6 years ago

ArtPoon commented 7 years ago

This should be a dictionary. We can base the structure of this on the XML output of the HIVdb algorithm:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<DrugResistance_Interpretation xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://hivdb.stanford.edu/DR/schema/sierra.xsd">
    <algorithmName>HIVDB</algorithmName>
    <algorithmVersion>8.4</algorithmVersion>
    <webServiceVersion>2.0</webServiceVersion>
    <schemaVersion>1.1</schemaVersion>
    <submissionName/>
    <dateTime>2017-08-22 18:19:47.978</dateTime>
    <result>
        <success>true</success>
        <inputSequence>
            <md5sum>f8a9b4e6cfb7de1b4be1cb0eabd1a1cd</md5sum>
            <name>userinput unamed sample: 1</name>
            <sequence>CCTCAGGTCACTCTTTGGCAACGACCCCTCGTCACAATAAAGATAGGGGGGCAACTAAAGGAAGCTCTATTAGATACAGGAGCAGATGATACAGTATTAGAAGAAATGAGTTTGCCAGGAAGATGGAAACCAAAAATGATAGGGGGAATTGGAGGTTTTATCAAAGTAAGACAGTATGATCAGATACTCATAGAAATCTGTGGACATAAAGCTATAGGTACAGTATTAGTAGGACCTACACCTGTCAACATAATTGGAAGAAATCTGTTGACTCAGATTGGTTGCACTTTAAATTTTCCCATTAGCCCTATTGAGACTGTACCAGTAAAATTAAAGCCAGGAATGGATGGCCCAAAAGTTAAACAATGGCCATTGACAGAAGAAAAAATAAAAGCATTAGTAGAAATTTGTACAGAGATGGAAAAGGAAGGGAAAATTTCAAAAATTGGGCCTGAAAATCCATACAATACTCCAGTATTTGCCATAAAGAAAAAAGACAGTACTAAATGGAGAAAATTAGTAGATTTCAGAGAACTTAATAAGAGAACTCAAGACTTCTGGGAAGTTCAATTAGGAATACCACATCCCGCAGGGTTAAAAAAGAAAAAATCAGTAACAGTACTGGATGTGGGTGATGCATATTTTTCAGTTCCCTTAGATGAAGACTTCAGGAAGTATACTGCATTTACCATACCTAGTATAAACAATGAGACACCAGGGATTAGATATCAGTACAATGTGCTTCCACAGGGATGGAAAGGATCACCAGCAATATTCCAAAGTAGCATGACAAAAATCTTAGAGCCTTTTAGAAAACAAAATCCAGACATAGTTATCTATCAATACATGGATGATTTGTATGTAGGATCTGACTTAGAAATAGGGCAGCATAGAACAAAAATAGAGGAGCTGAGACAACATCTGTTGAGGTGGGGACTTACCACACCAGACAAAAAACATCAGAAAGAACCTCCATTCCTTTGGATGGGTTATGAACTCCATCCTGATAAATGGACAGTACAGCCTATAGTGCTGCCAGAAAAAGACAGCTGGACTGTCAATGACATACAGAAGTTAGTGGGGAAATTGAATTGGGCAAGTCAGATTTACCCAGGGATTAAAGTAAGGCAATTATGTAAACTCCTTAGAGGAACCAAAGCACTAACAGAAGTAATACCACTAACAGAAGAAGCAGAG</sequence>
        </inputSequence>
        <GAHypermutated>false</GAHypermutated>
        <geneData>
            <gene>PR</gene>
            <present>true</present>
            <consensus>PQITLWQRPLVTIKIGGQLKEALLDTGADDTVLEEMNLPGRWKPKMIGGIGGFIKVRQYDQILIEICGHKAIGTVLVGPTPVNIIGRNLLTQIGCTLNF</consensus>
            <alignedNASequence>CCTCAGGTCACTCTTTGGCAACGACCCCTCGTCACAATAAAGATAGGGGGGCAACTAAAGGAAGCTCTATTAGATACAGGAGCAGATGATACAGTATTAGAAGAAATGAGTTTGCCAGGAAGATGGAAACCAAAAATGATAGGGGGAATTGGAGGTTTTATCAAAGTAAGACAGTATGATCAGATACTCATAGAAATCTGTGGACATAAAGCTATAGGTACAGTATTAGTAGGACCTACACCTGTCAACATAATTGGAAGAAATCTGTTGACTCAGATTGGTTGCACTTTAAATTTT</alignedNASequence>
            <alignedAASequence>PQVTLWQRPLVTIKIGGQLKEALLDTGADDTVLEEMSLPGRWKPKMIGGIGGFIKVRQYDQILIEICGHKAIGTVLVGPTPVNIIGRNLLTQIGCTLNF</alignedAASequence>
            <firstAA>1</firstAA>
            <lastAA>99</lastAA>
            <subtype>
                <type>B (0.00%)</type>
                <percentSimilarity>100</percentSimilarity>
            </subtype>
            <mutation>
                <classification>OTHER</classification>
                <type>mutation</type>
                <mutationString>I3V</mutationString>
                <wildType>I</wildType>
                <position>3</position>
                <nucleicAcid>GTC</nucleicAcid>
                <translatedNA>V</translatedNA>
            </mutation>
            <mutation>
                <classification>OTHER</classification>
                <type>mutation</type>
                <mutationString>N37S</mutationString>
                <wildType>N</wildType>
                <position>37</position>
                <nucleicAcid>AGT</nucleicAcid>
                <translatedNA>S</translatedNA>
            </mutation>
            <quality/>
        </geneData>
        <geneData>
            <gene>RT</gene>
            <present>true</present>
            <consensus>PISPIETVPVKLKPGMDGPKVKQWPLTEEKIKALVEICTEMEKEGKISKIGPENPYNTPVFAIKKKDSTKWRKLVDFRELNKRTQDFWEVQLGIPHPAGLKKKKSVTVLDVGDAYFSVPLDKDFRKYTAFTIPSINNETPGIRYQYNVLPQGWKGSPAIFQSSMTKILEPFRKQNPDIVIYQYMDDLYVGSDLEIGQHRTKIEELRQHLLRWGFTTPDKKHQKEPPFLWMGYELHPDKWTVQPIVLPEKDSWTVNDIQKLVGKLNWASQIYAGIKVKQLCKLLRGTKALTEVIPLTEEAELELAENREILKEPVHGVYYDPSKDLIAEIQKQGQGQWTYQIYQEPFKNLKTGKYARMRGAHTNDVKQLTEAVQKIATESIVIWGKTPKFKLPIQKETWEAWWTEYWQATWIPEWEFVNTPPLVKLWYQLEKEPIVGAETFYVDGAANRETKLGKAGYVTDRGRQKVVSLTDTTNQKTELQAIHLALQDSGLEVNIVTDSQYALGIIQAQPDKSESELVSQIIEQLIKKEKVYLAWVPAHKGIGGNEQVDKLVSAGIRKVL</consensus>
            <alignedNASequence>CCCATTAGCCCTATTGAGACTGTACCAGTAAAATTAAAGCCAGGAATGGATGGCCCAAAAGTTAAACAATGGCCATTGACAGAAGAAAAAATAAAAGCATTAGTAGAAATTTGTACAGAGATGGAAAAGGAAGGGAAAATTTCAAAAATTGGGCCTGAAAATCCATACAATACTCCAGTATTTGCCATAAAGAAAAAAGACAGTACTAAATGGAGAAAATTAGTAGATTTCAGAGAACTTAATAAGAGAACTCAAGACTTCTGGGAAGTTCAATTAGGAATACCACATCCCGCAGGGTTAAAAAAGAAAAAATCAGTAACAGTACTGGATGTGGGTGATGCATATTTTTCAGTTCCCTTAGATGAAGACTTCAGGAAGTATACTGCATTTACCATACCTAGTATAAACAATGAGACACCAGGGATTAGATATCAGTACAATGTGCTTCCACAGGGATGGAAAGGATCACCAGCAATATTCCAAAGTAGCATGACAAAAATCTTAGAGCCTTTTAGAAAACAAAATCCAGACATAGTTATCTATCAATACATGGATGATTTGTATGTAGGATCTGACTTAGAAATAGGGCAGCATAGAACAAAAATAGAGGAGCTGAGACAACATCTGTTGAGGTGGGGACTTACCACACCAGACAAAAAACATCAGAAAGAACCTCCATTCCTTTGGATGGGTTATGAACTCCATCCTGATAAATGGACAGTACAGCCTATAGTGCTGCCAGAAAAAGACAGCTGGACTGTCAATGACATACAGAAGTTAGTGGGGAAATTGAATTGGGCAAGTCAGATTTACCCAGGGATTAAAGTAAGGCAATTATGTAAACTCCTTAGAGGAACCAAAGCACTAACAGAAGTAATACCACTAACAGAAGAAGCAGAG</alignedNASequence>
            <alignedAASequence>PISPIETVPVKLKPGMDGPKVKQWPLTEEKIKALVEICTEMEKEGKISKIGPENPYNTPVFAIKKKDSTKWRKLVDFRELNKRTQDFWEVQLGIPHPAGLKKKKSVTVLDVGDAYFSVPLDEDFRKYTAFTIPSINNETPGIRYQYNVLPQGWKGSPAIFQSSMTKILEPFRKQNPDIVIYQYMDDLYVGSDLEIGQHRTKIEELRQHLLRWGLTTPDKKHQKEPPFLWMGYELHPDKWTVQPIVLPEKDSWTVNDIQKLVGKLNWASQIYPGIKVRQLCKLLRGTKALTEVIPLTEEAE</alignedAASequence>
            <firstAA>1</firstAA>
            <lastAA>300</lastAA>
            <subtype>
                <type>B (0.00%)</type>
                <percentSimilarity>100</percentSimilarity>
            </subtype>
            <mutation>
                <classification>OTHER</classification>
                <type>mutation</type>
                <mutationString>K122E</mutationString>
                <wildType>K</wildType>
                <position>122</position>
                <nucleicAcid>GAA</nucleicAcid>
                <translatedNA>E</translatedNA>
            </mutation>
            <mutation>
                <classification>OTHER</classification>
                <type>mutation</type>
                <mutationString>F214L</mutationString>
                <wildType>F</wildType>
                <position>214</position>
                <nucleicAcid>CTT</nucleicAcid>
                <translatedNA>L</translatedNA>
            </mutation>
            <mutation>
                <classification>OTHER</classification>
                <type>mutation</type>
                <mutationString>A272P</mutationString>
                <wildType>A</wildType>
                <position>272</position>
                <nucleicAcid>CCA</nucleicAcid>
                <translatedNA>P</translatedNA>
            </mutation>
            <mutation>
                <classification>OTHER</classification>
                <type>mutation</type>
                <mutationString>K277R</mutationString>
                <wildType>K</wildType>
                <position>277</position>
                <nucleicAcid>AGG</nucleicAcid>
                <translatedNA>R</translatedNA>
            </mutation>
            <quality/>
        </geneData>
        <geneData>
            <gene>IN</gene>
            <present>false</present>
        </geneData>
        <sequenceQualityCounts>
            <insertions>0</insertions>
            <deletions>0</deletions>
            <ambiguous>0</ambiguous>
            <stops>0</stops>
            <frameshifts>0</frameshifts>
        </sequenceQualityCounts>
        <drugScore>
            <drugCode>ATV/r</drugCode>
            <genericName>atazanavir/r</genericName>
            <type>PI</type>
            <score>0</score>
            <resistanceLevel>1</resistanceLevel>
            <resistanceLevelText>Susceptible</resistanceLevelText>
            <threeStepResistanceLevel>S</threeStepResistanceLevel>
        </drugScore>
        <drugScore>
            <drugCode>DRV/r</drugCode>
            <genericName>darunavir/r</genericName>
            <type>PI</type>
            <score>0</score>
            <resistanceLevel>1</resistanceLevel>
            <resistanceLevelText>Susceptible</resistanceLevelText>
            <threeStepResistanceLevel>S</threeStepResistanceLevel>
        </drugScore>
        <drugScore>
            <drugCode>FPV/r</drugCode>
            <genericName>fosamprenavir/r</genericName>
            <type>PI</type>
            <score>0</score>
            <resistanceLevel>1</resistanceLevel>
            <resistanceLevelText>Susceptible</resistanceLevelText>
            <threeStepResistanceLevel>S</threeStepResistanceLevel>
        </drugScore>
        <drugScore>
            <drugCode>IDV/r</drugCode>
            <genericName>indinavir/r</genericName>
            <type>PI</type>
            <score>0</score>
            <resistanceLevel>1</resistanceLevel>
            <resistanceLevelText>Susceptible</resistanceLevelText>
            <threeStepResistanceLevel>S</threeStepResistanceLevel>
        </drugScore>
        <drugScore>
            <drugCode>LPV/r</drugCode>
            <genericName>lopinavir/r</genericName>
            <type>PI</type>
            <score>0</score>
            <resistanceLevel>1</resistanceLevel>
            <resistanceLevelText>Susceptible</resistanceLevelText>
            <threeStepResistanceLevel>S</threeStepResistanceLevel>
        </drugScore>
        <drugScore>
            <drugCode>NFV</drugCode>
            <genericName>nelfinavir</genericName>
            <type>PI</type>
            <score>0</score>
            <resistanceLevel>1</resistanceLevel>
            <resistanceLevelText>Susceptible</resistanceLevelText>
            <threeStepResistanceLevel>S</threeStepResistanceLevel>
        </drugScore>
        <drugScore>
            <drugCode>SQV/r</drugCode>
            <genericName>saquinavir/r</genericName>
            <type>PI</type>
            <score>0</score>
            <resistanceLevel>1</resistanceLevel>
            <resistanceLevelText>Susceptible</resistanceLevelText>
            <threeStepResistanceLevel>S</threeStepResistanceLevel>
        </drugScore>
        <drugScore>
            <drugCode>TPV/r</drugCode>
            <genericName>tipranavir/r</genericName>
            <type>PI</type>
            <score>0</score>
            <resistanceLevel>1</resistanceLevel>
            <resistanceLevelText>Susceptible</resistanceLevelText>
            <threeStepResistanceLevel>S</threeStepResistanceLevel>
        </drugScore>
        <drugScore>
            <drugCode>ABC</drugCode>
            <genericName>abacavir</genericName>
            <type>NRTI</type>
            <score>0</score>
            <resistanceLevel>1</resistanceLevel>
            <resistanceLevelText>Susceptible</resistanceLevelText>
            <threeStepResistanceLevel>S</threeStepResistanceLevel>
        </drugScore>
        <drugScore>
            <drugCode>AZT</drugCode>
            <genericName>zidovudine</genericName>
            <type>NRTI</type>
            <score>0</score>
            <resistanceLevel>1</resistanceLevel>
            <resistanceLevelText>Susceptible</resistanceLevelText>
            <threeStepResistanceLevel>S</threeStepResistanceLevel>
        </drugScore>
        <drugScore>
            <drugCode>D4T</drugCode>
            <genericName>stavudine</genericName>
            <type>NRTI</type>
            <score>0</score>
            <resistanceLevel>1</resistanceLevel>
            <resistanceLevelText>Susceptible</resistanceLevelText>
            <threeStepResistanceLevel>S</threeStepResistanceLevel>
        </drugScore>
        <drugScore>
            <drugCode>DDI</drugCode>
            <genericName>didanosine</genericName>
            <type>NRTI</type>
            <score>0</score>
            <resistanceLevel>1</resistanceLevel>
            <resistanceLevelText>Susceptible</resistanceLevelText>
            <threeStepResistanceLevel>S</threeStepResistanceLevel>
        </drugScore>
        <drugScore>
            <drugCode>FTC</drugCode>
            <genericName>emtricitabine</genericName>
            <type>NRTI</type>
            <score>0</score>
            <resistanceLevel>1</resistanceLevel>
            <resistanceLevelText>Susceptible</resistanceLevelText>
            <threeStepResistanceLevel>S</threeStepResistanceLevel>
        </drugScore>
        <drugScore>
            <drugCode>3TC</drugCode>
            <genericName>lamivudine</genericName>
            <type>NRTI</type>
            <score>0</score>
            <resistanceLevel>1</resistanceLevel>
            <resistanceLevelText>Susceptible</resistanceLevelText>
            <threeStepResistanceLevel>S</threeStepResistanceLevel>
        </drugScore>
        <drugScore>
            <drugCode>TDF</drugCode>
            <genericName>tenofovir</genericName>
            <type>NRTI</type>
            <score>0</score>
            <resistanceLevel>1</resistanceLevel>
            <resistanceLevelText>Susceptible</resistanceLevelText>
            <threeStepResistanceLevel>S</threeStepResistanceLevel>
        </drugScore>
        <drugScore>
            <drugCode>EFV</drugCode>
            <genericName>efavirenz</genericName>
            <type>NNRTI</type>
            <score>0</score>
            <resistanceLevel>1</resistanceLevel>
            <resistanceLevelText>Susceptible</resistanceLevelText>
            <threeStepResistanceLevel>S</threeStepResistanceLevel>
        </drugScore>
        <drugScore>
            <drugCode>ETR</drugCode>
            <genericName>etravirine</genericName>
            <type>NNRTI</type>
            <score>0</score>
            <resistanceLevel>1</resistanceLevel>
            <resistanceLevelText>Susceptible</resistanceLevelText>
            <threeStepResistanceLevel>S</threeStepResistanceLevel>
        </drugScore>
        <drugScore>
            <drugCode>NVP</drugCode>
            <genericName>nevirapine</genericName>
            <type>NNRTI</type>
            <score>0</score>
            <resistanceLevel>1</resistanceLevel>
            <resistanceLevelText>Susceptible</resistanceLevelText>
            <threeStepResistanceLevel>S</threeStepResistanceLevel>
        </drugScore>
        <drugScore>
            <drugCode>RPV</drugCode>
            <genericName>rilpivirine</genericName>
            <type>NNRTI</type>
            <score>0</score>
            <resistanceLevel>1</resistanceLevel>
            <resistanceLevelText>Susceptible</resistanceLevelText>
            <threeStepResistanceLevel>S</threeStepResistanceLevel>
        </drugScore>
        <scoreTable>
            <scoreRow>
                <score value="PI"/>
                <score value="ATV/r"/>
                <score value="DRV/r"/>
                <score value="FPV/r"/>
                <score value="IDV/r"/>
                <score value="LPV/r"/>
                <score value="NFV"/>
                <score value="SQV/r"/>
                <score value="TPV/r"/>
            </scoreRow>
            <scoreRow>
                <score value="Total:"/>
                <score class="PI" drug="ATV/r" value="0"/>
                <score class="PI" drug="DRV/r" value="0"/>
                <score class="PI" drug="FPV/r" value="0"/>
                <score class="PI" drug="IDV/r" value="0"/>
                <score class="PI" drug="LPV/r" value="0"/>
                <score class="PI" drug="NFV" value="0"/>
                <score class="PI" drug="SQV/r" value="0"/>
                <score class="PI" drug="TPV/r" value="0"/>
            </scoreRow>
        </scoreTable>
        <scoreTable>
            <scoreRow>
                <score value="NRTI"/>
                <score value="ABC"/>
                <score value="AZT"/>
                <score value="D4T"/>
                <score value="DDI"/>
                <score value="FTC"/>
                <score value="3TC"/>
                <score value="TDF"/>
            </scoreRow>
            <scoreRow>
                <score value="Total:"/>
                <score class="NRTI" drug="ABC" value="0"/>
                <score class="NRTI" drug="AZT" value="0"/>
                <score class="NRTI" drug="D4T" value="0"/>
                <score class="NRTI" drug="DDI" value="0"/>
                <score class="NRTI" drug="FTC" value="0"/>
                <score class="NRTI" drug="3TC" value="0"/>
                <score class="NRTI" drug="TDF" value="0"/>
            </scoreRow>
        </scoreTable>
        <scoreTable>
            <scoreRow>
                <score value="NNRTI"/>
                <score value="EFV"/>
                <score value="ETR"/>
                <score value="NVP"/>
                <score value="RPV"/>
            </scoreRow>
            <scoreRow>
                <score value="Total:"/>
                <score class="NNRTI" drug="EFV" value="0"/>
                <score class="NNRTI" drug="ETR" value="0"/>
                <score class="NNRTI" drug="NVP" value="0"/>
                <score class="NNRTI" drug="RPV" value="0"/>
            </scoreRow>
        </scoreTable>
    </result>
</DrugResistance_Interpretation>
ArtPoon commented 7 years ago

Proposed key-value pairs:

ArtPoon commented 7 years ago

Output JSON like sierra-client.

jzpero commented 7 years ago

Progress on the output (which is really nested)

Progress on the program:

Other todo:

ArtPoon commented 7 years ago

We cannot exactly duplicate the content of the JSON output from sierra-client because NucAmino does not return aligned sequences. In addition we do not know how subtype classifications are being generated.