Illumina / ExpansionHunter

A tool for estimating repeat sizes
Other
182 stars 51 forks source link

variant_catalog.json #194

Open LJ-jiao opened 5 months ago

LJ-jiao commented 5 months ago

I want to create a variant_catalog.json file to identify VNTR in FcRn gene using WGS sequencing data. The FcRn gene contains a VNTR sequence consisting of three repeats of 37 nucleotides each, with one nucleotide difference in the third repeat compared to the first two repeats. How should I write this JSON file?

The sequence of this region is as follows.

>hg38_dna range=chr19:49512986-49513096 5'pad=0 3'pad=0 strand=+ repeatMasking=lower
cggactcctgggtccgagggtagagcggttgggggcc
cggactcctgggtccgagggtagagcggttgggggcc
cggactcctgggtccgagggaagagcggttgggggcc
andreasssh commented 5 months ago

If you want to get the genotype involving both motifs, you can define the first motif as the repeated unit and the length of all 3 repeats as the region. Both motifs (3 repeats) will be counted despite the 1-nucleotide difference.

[
    {
        "LocusId": "FCGRT",
        "LocusStructure": "(CGGACTCCTGGGTCCGAGGGTAGAGCGGTTGGGGGCC)*",
        "ReferenceRegion": "chr19:49512985-49513096",
        "VariantId": "FCGRT",
        "VariantType": "Repeat"
    }
]