PacificBiosciences / trgt

Tandem repeat genotyping and visualization from PacBio HiFi data
Other
98 stars 7 forks source link

Multiple Tandem Repeat Motif #33

Closed MKaandemir closed 2 months ago

MKaandemir commented 3 months ago

Hi,

a TRGT VCF line contains multiple repeat motifs within the same allele, as shown below:

chr3    183712187    .    CTTTTATTTTATTTTATTTTATTTTATTTTATTTTA    CTTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATGTTATTTTATGTTATTTTATTTTATGTTATTTTATGTTATTTTATGTTATTTTATGTTATGTTATGTTATGTTATGTTATGTTATGTTATGTTGTTATGTTATGTTATGTTATGTTATGTTATGTTATGTTATGTTATTTTA    0    .    TRID=YEATS2;END=183712222;MOTIFS=TTTCA,TTTTA,TGTTA,TTTTT;STRUC=<YEATS2>    GT:AL:ALLR:SD:MC:MS:AP:AM    0/1:35,193:34-36,187-196:21,16:0_7_0_0,0_16_23_0:1(0-35),1(0-80)_2(80-193):1.000000,0.958974:.,.

I have a few questions regarding this:

Thank you!

Best regards, Mehmet Kaan

bw2 commented 3 months ago

Just provided one response @ https://github.com/broadinstitute/str-analysis/issues/17

egor-dolzhenko commented 3 months ago

Thanks @bw2! @MKaandemir to add to @bw2's reply, RFC1 repeat is one of the most studied multi-motif repeats. For example take a look at https://pubmed.ncbi.nlm.nih.gov/31230722/, https://pubmed.ncbi.nlm.nih.gov/30926972/, and https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10689911/ (there are many other references). One simple idea for studying such repeats is to compare the counts of each repeat motif and also the overall allele purity.

MKaandemir commented 3 months ago

Thanks for the answers @bw2 @egor-dolzhenko! Regarding the annotation problem, I was considering a mosaicism-specific annotation approach. If the mosaicism consists only of benign motifs, I believe it should be classified as benign. However, if it includes any unknown or pathogenic motifs alongside benign motifs, it should be classified as unknown. I'm unsure about the classification if it consists solely of pathogenic repeats. Additionally, could it be said that if the size of a mosaic repeat locus exceeds a certain threshold, it might be potentially pathogenic?

egor-dolzhenko commented 3 months ago

For known pathogenic repeats there are specific size thresholds / motifs that you could use. Take a look at STRchive for example. In other cases it makes sense to flag repeats with abnormal length or sequence composition for a follow up analysis. However, proving that such changes might be pathogenic is a difficult task (there are many examples in the literature).

MKaandemir commented 3 months ago

Let's say the count of CGA motifs is 50 and the count of CAA motifs is 30. Both of these are recognized as pathogenic repeat motifs in the literature. If the specific size threshold for CGA motifs is 75, can we consider this mosaic tandem repeats as pathogenic based on this count?

egor-dolzhenko commented 3 months ago

I suspect that such cases are very rare. It would make sense to flag them for review.

MKaandemir commented 2 months ago

Thanks! I will flag them