Closed bartcharbon closed 1 month ago
Hi @bartcharbon,
Thanks for looking into Straglr. Complex patterns with interruptions is a toughie. From my experience the reads always deviate from the complex pattern specified, like I don't think you would expect all your reads will show (TTTTA){5}TTA(TTCTA){5}, or even (TTTTA){n}TTA(TTCTA){n}. The best Straglr can do right now is if you specify the expected motif as TT*
, so all the 3 motifs will hopefully be captured. The "actual_motif" field in the TSV output will tell you what TRF think the motif is.
I have been made aware of some software that's trying to delineate the complex repeat pattern. Here's one:
https://academic.oup.com/bioinformatics/article/39/4/btad185/7114028?login=false
https://github.com/morisUtokyo/uTR
I haven't tried it myself and would like to know if it's any good.
Anyways, you can try Straglr with the TT*
regex and give uTR a whirl, please let me know the results of both if you do!
Dear @readmanchiu,
Some of the STR's we are interested in have a more complex pattern instead of a simple repeating sequence. e.g. something like (TTTTA){5}TTA(TTCTA){5} (5 TTTTA's followed by a single TTA and then 5 TTCTA's)
Is this something Straglr would be able to do? And if so what would be the correct way te specify a unit like this in the loci bed file?
And a related question: is there documentation on how to specify the repeat pattern? I've been playing around with "*" and "+" signs, also in combination with brackets and curly brackets. Are constructions with these tokens supported?