Illumina / ExpansionHunter

A tool for estimating repeat sizes
Other
183 stars 51 forks source link

Haplotypes #197

Open ralonso-igenomix opened 1 month ago

ralonso-igenomix commented 1 month ago

Hi,

I have the following JSON file (5t.json):

[
  {
    "LocusId": "CFTR",
    "LocusStructure": "(TG)*(T)*",
    "ReferenceRegion": ["7:117188660-117188682", "7:117188682-117188689"],
    "VariantId": ["CFTR_TG", "CFTR_T"],
    "VariantType": ["Repeat", "Repeat"]
  }
]

I would like the output to show me haplotypes. For example, I have this data:

#CHROM  POS        ID  REF  ALT         QUAL  FILTER  INFO                                      FORMAT  Sample
7       117188660  .   A    <STR10>,<STR12>   .     PASS    END=117188682;REF=11;RL=22;RU=TG;VARID=CFTR_TG;REPID=CFTR_TG  GT:SO:REPCN:REPCI:ADSP:ADFL:ADIR:LC  1/2:SPANNING/SPANNING:10/12:10-10/12-12:89/41:89/128:0/0:57.843666  
7       117188682  .   G    <STR5>         .     PASS    END=117188689;REF=7;RL=7;RU=T;VARID=CFTR_T;REPID=CFTR_T      GT:SO:REPCN:REPCI:ADSP:ADFL:ADIR:LC  0/1:SPANNING/SPANNING:7/5:7-7/5-5:128/125:16/14:0/0:57.843666

However, I'm not sure how to determine which STR10 or STR12 matches with STR5. Is there a way to infer the haplotype or get the LocusStructure call in a single line?

If not, I'm unclear on the benefit of using a LocusStructure like "(TG)*(T)" versus separating it into two distinct LocusId.

This is the command I’m using:

expansionHunter --reads $bam --reference $fasta --variant-catalog $file_5t --output-prefix $outprefix --sex male

Many thanks!