Closed MKaandemir closed 3 months ago
Thanks for the question! Yes, TRGT only uses motifs present in the repeat catalog. However we recently implemented a tool that can help with identification of novel motifs. For example, you could extract the allele sequences reported by TRGT for a repeat of interest and then run them through this tool.
Thanks for the answer. I really appreciate it. I also wonder if the reported allele sequence can change if I change the order of motifs in the repeat catalog?
Happy to help! The allele sequences are not dependent on the specified motifs, so they shouldn't change. However, the reported motif counts could change in principle. One example is when you have an allele composed of a new, unknown motif that matches two known motifs equally well. It's best to keep the order of motifs the same in all analyses.
We are constraining an unknown repeat to match one of the specified motifs in our repeat catalog. To discover new repeat motifs, the tr-solve tool is required, correct? Why isn't this feature implemented in the trgt tool?
I also wonder how trgt address mosaicism in this sequence:
ACGACGACGACGACTACTACTACTACGACGACGACG
Would you consider it as "ACG, ACT 8_5" or "ACG 4 ACT 5 ACG 4"?
It seems that identifying de novo motifs should be done at the population level instead of the single sample level with TRGT. There are many messy low complexity regions where it's not clear what the right motifs should be and hence relatively small changes in the allele sequence may result in different motif sets.
As to your last question. The MC
VCF field contains the overall count of each motif (even if the motif run is interrupted by another sequence) while the MS
field lists the span of each uninterrupted motif run.
Note that the contents of MC
and MS
fields are based on HMM segmentation and hence allow for imperfect motif copies. If you are only interested in studying perfect motif occurrences, you could get those directly from the allele sequences reported by TRGT.
Thanks for the help!
Hi,
I am curious about the functionality of TRGT regarding tandem repeat motifs. Can TRGT identify de novo tandem repeat motifs in samples, or does it strictly use the motifs present in the repeat catalog?
Thank you!