broadinstitute / str-analysis

Scripts and utilities related to analyzing short tandem repeats (STRs).
MIT License
30 stars 8 forks source link

Multiallelic STRs #12

Closed themkdemiiir closed 7 months ago

themkdemiiir commented 8 months ago

Hello! I thank you for the great tool that you've created. It's been really helpful. However, I have questions regarding multiallelic short tandem repeats on ExpansionHunter. I found that EH doesn't support multiallelic repeats, and I'd like to know how we can decide which allele is in the pathogenic range in such cases. To illustrate, let's assume that there are two polymorphic repeat units: AAT (benign) and AAC (pathogenic). Suppose that the tool finds repeat counts of 300 and 10, but we need to know which is which. Therefore, we can't determine whether AAT is the 300 repeat count or AAC, which makes it difficult to determine if it's pathogenic or not.

You mention here the workaround but I don't think it helps to find all the motifs. "If this option is specified, this script will run ExpansionHunter once for each of the motif(s) it detects at the locus. ExpansionHunter doesn't currently support genotyping multiallelic repeats such as RFC1 where an individual may have 2 alleles with motifs that differ from each other (and from the reference motif). Running ExpansionHunter separately for each motif provides a workaround."

bw2 commented 8 months ago

Hi @themkdemiiir . I don't think there's currently a method can fully solve this for short read data. The call_non_ref_motifs.py script should be able to detect the main motifs present, but the ExpansionHunter allele size estimates will almost certainly be less accurate than for regular (single motif) STR loci. You may also want to try running STRling to see if it gives you better results for this locus https://github.com/quinlan-lab/STRling

themkdemiiir commented 8 months ago

There is also a problem I noticed in the expansionhunter. I could give whatever repeat unit I want (even some random characters), and it would give me the same result. So even though you select benign and pathogenic units, it could also be other de novo tandem repeats; what can be done in such situations?

bw2 commented 8 months ago

It would be helpful to have a more specific description of the problem or an example, but I likely won't have a better answer than my previous one.