Illumina / ExpansionHunter

A tool for estimating repeat sizes
Other
182 stars 51 forks source link

Complex Repeats #191

Closed dwill023 closed 7 months ago

dwill023 commented 8 months ago

Just wanted to know the correct way to make the LocusStructure for a complex repeat. For example the forensic STR D21S11 has the Sequence Pattern: [TCTA]n [TCTG]n [TCTA]n TA [TCTA]n TCA [TCTA]n TCCATA [TCTA]n

I have the json below.

{
    "LocusId": "D21S11",
    "LocusStructure": "(TCTA)*(TCTG)*(TCTA)*TA(TCTA)*TCA(TCTA)*TCCATA(TCTA)*",
    "ReferenceRegion": "21:19181969-19182105",
    "VariantType": "Repeat"
  }

I get the error: [Error loading locus D21S11: Locus D21S11 must specify reference regions for 6 variants].

Do I just repeat the ReferenceRegion and VariantType 6 times? Just want to make sure I'm formatting the json correctly.

Also what about variants like below that have any ACTG repeated 42 times between other observed repeats?

{
    "LocusId": "DYS448",
    "LocusStructure": "(AGAGAT)*([ACTG]){42}(AGAGAT)*",
    "ReferenceRegion": "Y:22218919-22219087",
    "VariantType": "Repeat"
  }
andreasssh commented 7 months ago

Since you have six loci, then you gotta make the ReferenceRegion and VariantType as an array containing references for each of the locus. You can also set VariantId field that enables you to name all loci (might be easier to make differences between loci/post-process; otherwise, by default, coordinates will be used after LocusId).

[
    {
        "LocusId": "D21S11",
        "LocusStructure": "(TCTA)*(TCTG)*(TCTA)*TA(TCTA)*TCA(TCTA)*TCCATA(TCTA)*",
        "ReferenceRegion":
        [
            "21:19181972-19181988",
            "21:19181988-19182012",
            "21:19182012-19182024",
            "21:19182026-19182038",
            "21:19182041-19182049",
            "21:19182055-19182099"
        ],
        "VariantType":
        [
            "Repeat",
            "Repeat",
            "Repeat",
            "Repeat",
            "Repeat",
            "Repeat"
        ],
        "VariantId":
        [
            "D21S11_1",
            "D21S11_2",
            "D21S11_3",
            "D21S11_4",
            "D21S11_5",
            "D21S11_6"
        ]
    }
]
dwill023 commented 7 months ago

@andreasssh Thank you for your help it worked.